scrapies

For some time now Yarra Trams have had a little form on their website that allows you to get the arrival times for the next three trams for a specific tram service at a specific stop.

Unfortunately their site renders extremely badly on Firefox 3 and they don't provide a documented API, so the only way to get this data is to scrape it by submitting the form on their site and parsing the results.

Something else rather unfortunate is the lack of an HTML DOM parser in PHP. There is an XML one, but no such luck if you have messy HTML data to deal with. However, a convenient dutchman known only as "Bart" has written a small tokenizer that comes in extremely handy here.

Hint: Pear needs a generic HTML parser!

By hitting the form with curl and parsing the response through the tokenizer, I can now display the arrival times of the next 3 city trams at my local tram stop! Yay!

Of course the results are stored in memcache for a short time, just in case the other 12,000 people who use my stop decide to use this blog as the authoritative tram schedule info.

Comments

tramTRACKER can no longer process
additional web requests from this
IP address as the limit has been exceeded.

Should you wish to have this limit removed, please contact Customer Feedback on 1800 800 166 or fill out the feedback form on this website stating the reason multiple requests are required.

Well, it would seem that the per-IP hit counter they have reset itself overnight, so the widget works again. It's just enabled for me though, currently.

I can just see them lined up at the stop with their web browsers (firefox of course) .. watching and waiting for the next tram. Any luck plugging at the web site authors to fix up their website?

I haven't tried yet. I'll send 'em a mail today, see what they say :-)

Add new comment