Sunday, December 03, 2006

ripping web sites with perl

I was working on a small project, and thought I'd share part of its code. A lot of web sites like and some travel web sites, like to display the local weather. Very often this is taken from global weather web sites, etc. I thought since we do have the local met department web sites; we could use it. All you need to do is modify the script to be more presentable for web pages.

Also this is an example which demonstrates how easy it is to rip data off external web sites using perl.

use LWP::Simple;
use HTML::TreeBuilder;
use XML::Simple;

my $url = '';
my $page = get($url) or die $!;
my $p = HTML::TreeBuilder->new_from_content( $page );
my @links = $p->look_down(
_tag => 'table',
width => '656'
for my $row (@links) {
my @cells = $row->look_down( _tag => 'td' );
$text = join ( "\n", map { $_->as_trimmed_text( ) } @cells )."\n";
@lines = split(/\n/, $text);

$p = $p->delete; # don't need it anymore
print "\nWeather Forecast for Maldives\n";
for ($r=0;$r<6;$r+=2){
print $lines[$r] ." : " . $lines[$r+1]."\n";

$xml = new XML::Simple;
$raw = get('');
$data = $xml->XMLin($raw);

print "\n";
print ('Temp. for '.$ST .': '. $data->{STATIONS}->{$ST}->{TEMPERATURE}->{CELSIUS} .' . '
.'Wind speed: '. $data->{STATIONS}->{$ST}->{WIND}. ' . '.'Sun Rise: ' .$data->{STATIONS}->{$ST}->{SUN}->{RISE}
.' and set: '. $data->{STATIONS}->{$ST}->{SUN}->{SET} .' . '.'Rain Fall: '. $data->{STATIONS}->{$ST}->{RAINFALL}.'
. '
.'Humidity: '. $data->{STATIONS}->{$ST}->{HUMIDITY});

print "\n";

1 comment:

Mohamed said...

this is awesome!!.can u please give me a hint.I think it should quoted with codings,then only going to stop[ consuming useless weather stickers from non-Maldivian sites