Automatically watching for updates on web pages ?

August 31st, 2006 by lucas

Once in a while, I come upon a web page that :

  1. Doesn’t offer an RSS/Atom feed.
  2. Doesn’t change very often.
  3. I would like to be warned when it’s updated.

I would like to be automatically warned when such pages change. websec does this :

Description: Web Secretary – Web page monitoring software
A visual Web page monitoring software. However, it goes
beyond the normal functionalities offered by such software. Not only
does it detect changes based on content analysis (instead of date/time
stamp or simple textual comparison), it will email the changed page to
you with the new content highlighted.

But :

  1. It sends emails. Generating an RSS feed would be much better.
  2. It sends HTML emails. OK, you can use AsciiMarker to view the changes with a text MUA, but still…

Anybody knows of another piece of software I could use ?

12 Responses to “Automatically watching for updates on web pages ?”

  1. Zsoltik@ wrote on 08/31/06 at 5:35 pm :

    What about creating a feed with Feed43′s regexps?

  2. duff wrote on 08/31/06 at 5:40 pm :

    You could try email the website(s) author(s) and ask them to include an RSS feed…they may not know about it.

  3. Matthew Nuzum wrote on 08/31/06 at 5:46 pm :

    Back a few months ago when I was looking for work I wrote a python script that scraped job listings off a few company sites where I was interested in working. It would then sumarize it into an e-mail, but it would be no work at all to make it also create an RSS feed. Here’s the quick and dirty code: http://rafb.net/paste/results/ljHElf22.html

    Note that it has to submit a form in one case in order get the job listings. It would just spit out the results and, since it was running via cron, the output got e-mailed to me.

    The real brains here is the “mechanoid” library for python.

  4. Michael 'abi' Ablassmeier wrote on 08/31/06 at 6:59 pm :

    hi,

    what about lwp-mirror from libwww-perl?

    abi@radiohead:~$ lwp-mirror http://www.grinser.de/~abi/test.html tmp
    abi@radiohead:~$ echo $?
    0
    abi@radiohead:~$ lwp-mirror http://www.grinser.de/~abi/test.html tmp
    lwp-mirror: tmp is up to date

    its quite simple but might fit your needs

    bye,
    – michael

  5. James wrote on 08/31/06 at 7:56 pm :

    My bookmarks page (linked above) is some nasty python that diffs the current version with a saved copy, and orders all pages by date. Source is available on request. I find it much better than an RSS reader, since I can use it anywhere, and I can ignore sites easily without the unread count going up and making me guilty.

  6. Vincent wrote on 09/1/06 at 12:00 am :

    I’m using some custom python script to generate customized feeds.

  7. Brian Ewins wrote on 09/1/06 at 12:32 am :

    Ah, kids these days… everybody did this before RSS became ubiquitous. One system that did it was newsclipper.

    …disclaimer: I wrote one of newsclipper’s modules, but to be honest I threw newsclipper away and wrote our aggregator for myself, its easy enough to do, and necessary when the page you’re scraping is weird. Today, I’d try Feed43 first.

  8. mati wrote on 09/1/06 at 1:47 pm :

    There is also a Firefox extension – Notify

  9. Scott Tankard wrote on 09/2/06 at 1:26 am :

    A little utility I stumbled across a while ago is Specto (specto.sf.net) it does the basics of watching webpages and will likely do a whole lot more pretty soon (especially if you help!).

    It has a nice and simple PyGTK gui, and docks in the notification area.

  10. Wataru Tenga wrote on 09/2/06 at 10:32 am :

    Windows has a German program, WebSite-Watcher, which monitors (and highlights) changes in Web sites as well as any program I’ve ever tried. I don’t think the author is interested in a Linux port, but this is really the level of functionality I would like to have on Linux, to keep me from slipping back to Windows.

    Wataru Tenga, Tokyo

  11. Tilmann Hentze wrote on 09/2/06 at 1:08 pm :

    I use a shell-script, run by cron.

  12. Phil wrote on 09/2/06 at 3:06 pm :

    http://www.rsspect.com/ sounds like the sort of thing you want.