Automatically interacting with websites

here are some examples of how I create scripts to automatically do stuff with various websites.
not all of these scripts work, some never have, just experiments.

1. sneeuwhoogten

this is a very simple shell script that I scheduled with cron. I used it a couple of years ago to monitor snow heights in the czech republic.
#!/bin/sh
fn=/home/itsme/prj/sneeuw/logs/`date +%Y%m%d`.$$
GET http://verkeer2.anwb.org/ash/Tsjechie.html >$fn

2. multiguide

start of script not shown.
attempt at getting tv show information from a website.
my $station="46";
my $date="011027";

my $ua =LWP::UserAgent->new();
$ua->agent("Mozilla/4.76 [en] (X11; U; Linux 2.4.9 i686)");
my $jar= HTTP::Cookies->new();

my $rp1= $ua->request(GET "http://www.veronica.nl/cgi-bin/html/multiguide/show");
$jar->extract_cookies($rp1);

for my $page (qw(top complete_data bottom)) {
    my $rq= GET "http://www.veronica.nl/cgi-bin/html/multiguide/$page/tv/$station/$date/";
    $jar->add_cookie_header($rq);
    #$rq->header("Accept" => "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*");
    #$rq->header("Accept-Language" => "en");
    #$rq->header("Accept-Encoding" => "gzip");
    #$rq->header("Accept-Charset" => "iso-8859-1,*,utf-8");
    #$rq->header("Pragma" => "no-cache");

    my $rp= $ua->request($rq);

    print $rp->content, "\n";
}

3. boetebase

another unfinished script, an attempt to parse an online database with information about how various offenses are fined. script linked here

4. google

example scripts of how to parse google output.

5. webobjects

here is a generic script that can be used to interact with webobjects-based servers. the general layout of webobjects urls is as follows:
http://<hostname>/cgi-bin/WebObjects/<applicationname.woa>/[ <instanceid> / ]<actiontype>
actiontype can be:
    wa = WODirectActionRequestHandler 
        .../wa[/<classname>][/<actionmethod>]
         -> calls "<actionmethod>Action" on <classname>
         [ or "defaultAction" if no action specified ]
         [ or action on class DirectAction ]
    wo = WOComponentRequestHandler
        .../wo/[<pagename>/]<sessionid>/<contextid>.<elementid>
    WebServerResources = WOResourceRequestHandler

6. omroepnl

and yet another unfinished to get radio and tv-show information.

7. anwb

script to get the current trafic intensities for the west of holland. I took about 2 weeks worth of these pictures and put them together to for a time-lapse movie of traffic.
this script was scheduled every 15 minutes using cron.
#!/usr/bin/perl -w

use strict;

use POSIX;
use Time::Local;

use LWP::UserAgent;
use HTTP::Request::Common qw(POST GET);
use LWP::Simple;
use URI;
use Digest::MD5 qw(md5_hex);

chdir "/home/itsme/prj/sites/anwb/archive";

my $homepage=get "http://www.anwb.nl/servlet/Satellite?pagename=OpenMarket/ANWB_verkeer/PopupVerkeer&regio=randstad";

my ($imgfile)= ($homepage =~ m{", $filename or die "open: $filename : $!\n";
    print IMG $img;
    close IMG;
}

8. egroups

see here for my other page on egroups ( or yahougroups as it is currently called )
this is a script intended to make a copy of a mailinglist archive
this script was never quite finished, the 'login' part is still missing. It may be circumvented by loging in manually in a browser, and then copying the cookie to this script.

10. foksuk

script to archive the fokke+sukke cartoons
first there was this script, combined with this script to create indexes.
later both were combined in this perl script.

11. kieswijzer

see this page for more information on these scripts.

12. hotmail

script to login to hotmail, and (sort of) list contents of the mailbox. I had plans to write an automated hotmail account creator, but this has become more difficult since microsoft is now using captcha's to prevent scripted registration software. there are a few possible ways around this though.
  1. visual captcha's can be broken
  2. I noticed the number of different captcha's returned is limited. as if they keep a small number of valid captcha's around for a couple of minutes. making it possible to create many accounts by manually recognizing just 1 captch
there are other beter, more finished scripts, like httpmail.

13. trafficnet

this is a combination of a simple scheduled job
#!/bin/sh
cd /home/itsme/prj/sites/trafficnet
name=`date +%Y%m%d%H%M.%W.%w`
/usr/local/bin/wget -a trafficnet.log -O "daily/$name.html" -N http://maps.trafficnet.nl/asp/trafficstats.asp
and this script to create an overview of it.

14. girotel

this project started out by analysing the protocol used by the online banking system of the 'postbank'.
later they made their service available over the internet, leading me to create these scripts

15. sneak

Here I try to predict what movie will be playing here in delft next week. I don't think I ever guessed correctly. this script will make it easier to make wrong guesses based on hard data.

16. cia

here a unfinished script for parsing parts of the cia worldfact book

17. maps

here are some attempts to create bigger maps from small maps deliverd by some websites. One problem I encountered, is that big maps are square, while big parts of the earth are not. so it is impossible to match them up accurately.

18. chicon

here is a script to sort certain items from my local computer hardware store by price per significant attribute. ( like speed for cpu's, and mb for ram, and gb for hd's )

19. wetenschapskwis

here is a script to automatically take part in the vpro wetenschapskwis. it should get 'smarter' if you let it run longer.

improved version that keeps track of known answers in a small database here, this version also identifies itself to the server as 'wetenschapsbot'

20. phpbb

this script I wrote to parse phpbb forum articles from html pages recoverd from browser caches, and google: parsephpbbforum.pl

how to avoid being scriptable

these will not permanently solve scriptability problems, but at least postpone them.