Wednesday 24 April 2013

Download entire websites easily.

Use GNU Wget

Terminal

wget http://(website-name).org/

* exchange brackets and "website-name" for name of site downloading.

Wget can download all images and other data nested within the site and linked from top page. Use;

wget -r http://(website-name).org/

If a site refuses to allow you to do this and try to detect if you are using a browser or not. There is a -U option to identify Wget as one. Use;

wget  -r -p -U Mozilla http://www.stupidsite.com/restricedplace.html

To prevent being blacklisted for downloading the site use;

--wait=20     (example is 20 seconds wait between getting each retrieval)

--limit-rate=20K      (Limit the rate at which you download set in bits so add K to make it realise you want KB/s

EG.

wget --wait=20 --limit-rate=20K -r -p -U Mozilla http://www.stupidsite.com/restricedplace.html

To make sure you do not download contents of folders nested below that which you are downloading;

--no-parent

No comments:

Post a Comment