Dear lazyweb,
I need a tool to mirror a website locally (so I can browse it offline). Requirements:
– not GUI-based (I want to run it in a script)
– support recursive retrieval and include/exclude lists (like wget)
– no output when everything is fine, but still output errors (not possible with wget, which still output “basic information” when running with –no-verbose, and doesn’t output errors when running with –quiet)
– understands timestamps, and retransfers files if timestamps or sizes don’t match
– not too over-engineered, not too badly maintained, etc…
Thank you.
try lftp
Httrack may be able to do what you want although I never worried about the output too much. It is in the repos.
http://www.httrack.com/
The other thing you may want to look at is whether redirection of stdout/stderr would work for you.
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html
Simple wget will do the trick for you. The other one that works well for me is httrack.
try this:
wget -cbm http://www.mysite.com
Well wget doesn’t work for me, because it outputs stuff even if you run it with -nv and everything goes fine. For example, I’d like to run it in a cron, and get a mail if something went wrong.
I’ll try lftp and httrack, thank you.
I think wwwoffle meets your requirements. Being a proxy with an offline mode it’s quite a bit different, but you can simply add (even recursive) fetch requests from command line.
Helmut
i’ll suggests httrack too
Would wget be a valid solution if it wasn’t because of that output messages you mention? Maybe it was something to comment to wget upstream, it might be possible to make them optional or something.
Greetings,
Miry
I tried lftp, but lftp doesn’t work well if the website I’d like to mirror isn’t a simple list of files. If there’s an index.html file with some info on it, that file isn’t mirrored, even if lftp follows its links.
httrack doesn’t look too bad, but in the same time, it looks like a very big piece of java code.
Yes, Miriam, wget would work if the output messages problem was fixed. I’ve just filed a bug. see https://savannah.gnu.org/bugs/index.php?24293
oops, apparently httrack isn’t java. still, it /feels/ slow (and is much slower than lftp, I’m not sure why)
You could just output STDOUT of wget to /dev/null and still get STDERR if something goes wrong:
wget -m http://site.com/ > /dev/null
To mute both errors and standard output:
wget -m http://site.com/ > /dev/null 2>&1
or even just check the return value of wget(1) and send mail based on that.
You could use `wget -o file` and review file if wget’s exit status isn’t zero. It’s always good to have a log, just in case. By redirecting everything to /dev/null you lose too much information.
sitecopy
easy you can access via webdav ftp http and so on
you can manage in upload or download many site
copy ur site in local , modify pages sitecopy –update mysite on remote
You could give pavuk a try. It might slightly fail the maintained bit, but for the rest I think it should do fine.
Httrack is great because it helps convert a dynamic site (like wiki or CMS) into something useable (read-only, of course) in a static mirror.
The pattern matching is great, and allow, for instance mirroring wikis to just what you need (no history and such).
More details here for instance : http://www-public.it-sudparis.eu/~berger_o/weblog/2008/05/30/offline-backup-mediawiki-with-httrack/
I am very lucky to find your blog.thanks for sharing this informative post
This is a great post.thank you for sharing this informative post
Hi. I feel lucky to find out your blog.you are doing a great job here.thank you very much for sharing this informative topic.
I am really happy to discover your blog. You do a great job here.thanks for posting this nice topic
You do a great job here.really very nice blog.thanks man for sharing this post
Great article.thanks for posting it
I appreciate the work you do in this site.thanks for sharing this informative post