使用wget和cron下载网页 [英] Using wget and cron to download webpages

查看:180
本文介绍了使用wget和cron下载网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我知道我可以使用:

Ok, so I know i can use:

wget -r <website> > <file>

获取网页并保存。我的问题是,我如何使用cron和wget以每小时或甚至每分钟的基础获取一个网页,然后将它们保存到一个文件夹,zip和tarball它,然后继续添加以供日后审查。

to get a webpage and save it. My question is, how would i use cron and wget to get a webpage on an hourly, or even minute basis, and then save them into a folder, zip and tarball it, and then keep adding to it for a review at a later date.

我知道我可以手动做到这一点,我的目标是基本上下载它10 - 20分钟,大约4小时(无论如果它更长),并将所有内容附加到一个不错的目录,然后zip所述目录以节省空间,并在稍后检查它们。

I know i can manually do this, my goal is to basically download it ever 10- 20 minutes, for roughly 4 hours (doesn't matter if it goes longer) and append the all into a nice directory, then zip said directory to conserve space, and check them later in the day.

推荐答案

编辑cron表

crontab -e

您可以添加此类条目

0,20,40 * * * *  wget URL ~/files/file-`date > '+%m%d%y%H%M'`.html &

每20分钟下载/保存文件。

To download/save the file every 20 mins.

这里是关于crontab表达式的小参考,以便您可以调整值

Here it is a small reference about crontab expressions so you can adjust the values

对TAR文件自动地crontab会有点复杂:

To TAR the files automatically the crontab would be slightly complex:

0,20,40 * * * *  wget URL > ~/files`date '+%m%d%y'`/file-`date '+%H%M'`.html &
* 12 * * *       tar cvf ~/archive-`date '+%m%d%y'`.tar ~/files`date '+%m%d%y'`

这将在中午做,如果你想在mifnight做它更复杂,因为你需要TAR前一天,但我想这个你会得到的想法。

This would do it at noon, if you want to do it at mifnight it's more complex because you need to TAR the previous day but I think with this you'll get the idea.

这篇关于使用wget和cron下载网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆