Wget:如果文件已经存在就跳过下载? [英] Wget: Skip download if file already exists?

查看:58
本文介绍了Wget:如果文件已经存在就跳过下载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果文件存在于 wget 中,则跳过下载? 的答案使用 -nc--no-clobber,但 -nc 不会阻止 HTTP 请求的发送和随后的下载文件.如果文件已经被完全检索,它在下载文件后不会做任何事情.如果文件已经存在,是否有办法阻止发出 HTTP 请求?

Answers to Skip download if files exist in wget? say to use -nc, or --no-clobber, but -nc doesn't prevent the sending of the HTTP request and subsequent downloading of the file. It just doesn't do anything after downloading the file if the file has already been fully retrieved. Is there anyway to prevent making the HTTP request if the file already exists?

我安装了 wget 1.16.3 使用自制软件.运行下面的命令后,wget 对每个已经存在的文件说make HTTP request 之类的东西,似乎要下载它,然后说:file already取回,无事可做.

I installed wget 1.16.3 with Homebrew. After running the command below, wget said something like making HTTP request for each file that already existed, appeared to download it, and then said something like: file already retrieved, nothing to do.

wget --user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12' \
     --tries=1 \
     --no-clobber \
     --continue \
     --wait=0.3 \
     --random-wait \
     --adjust-extension \
     --load-cookies cookies.txt \
     --save-cookies cookies.txt \
     --keep-session-cookies \
         --recursive \
         --level=inf \
         --convert-links \
         --page-requisites \
         --reject=edit,logout,rate \
         --domains=example.com,s3.amazonaws.com \
         --span-hosts \
         --exclude-directories=/admin \
     http://example.com/

推荐答案

-nc 选项可以满足您的要求,至少在 wget 1.19.1 中是这样.

The -nc option does what you're asking for, at least in wget 1.19.1.

在我的服务器上,我有一个名为 index.html 的文件,其中包含指向 a.htmlb.html 的链接.

On my server, I have a file called index.html which contains links to a.html and b.html.

$ wget -r -nc http://127.0.0.1:8000/

服务器日志显示:

127.0.0.1 - - [23/Mar/2017 17:51:25] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /a.html HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /b.html HTTP/1.1" 200 -

现在我删除 b.html 并再次运行它:

Now I remove b.html and run it again:

$ rm 127.0.0.1\:8000/b.html
$ wget -r -nc http://127.0.0.1:8000/

服务器日志显示:

127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /b.html HTTP/1.1" 200 -

如您所见,只发出了对 b.html 的请求.

As you can see, only a request for b.html was made.

这篇关于Wget:如果文件已经存在就跳过下载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆