带有python时间限制的wget [英] wget with python time limit

查看:40
本文介绍了带有python时间限制的wget的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的 URL 文本文件,我必须通过 wget 下载.我写了一个小的 python 脚本,它基本上遍历每个域名并使用 wget (os.system("wget "+URL)) 下载它们.但问题是,如果远程服务器在连接后没有回复,wget 就会挂起连接.在这种情况下如何设置时间限制?如果远程服务器在连接后没有回复,我想在一段时间后终止 wget.

I have a large text file of URLs which I have to download via wget. I have written a small python script which basically loops through each domain name and download them using wget (os.system("wget "+URL)). But the problem is that wget just hangs on a connection if the remote server doesn't reply after connecting. How do I set a time limit in such a case? I want to terminate wget after some time if the remote server is not replying after connection.

问候,

推荐答案

这似乎不是关于 python 的问题,而是关于如何使用 wget 的问题.在您可能正在使用的 gnu wget 中,默认的重试次数是 20.您可以使用 -t 设置尝试次数,如果文件下载失败,wget -t0 可能会快速跳过它.或者,您可以使用 -S 标志来获得服务器响应,并让 python 做出适当的反应.但是,对您最有用的选项是 -T 或超时,将其设置为 -T10 使其在 10 秒后超时并继续.

This seems to be less a question about python, and more a question about how to use wget. in gnu wget, which you are likely using, the default number of retries is 20. you can set trieds using -t, perhaps wget -t0 would quickly skip it if the file fails to download. alternatively, you could use the -S flag to get sever response, and have python react appropriately. But, the most helpful options to you would be -T or timeout, set that to -T10 to have it timeout after ten seconds and move on.

如果你所做的只是遍历一个列表并下载一个 URL 列表,我会使用 wget,这里不需要 python.其实一行就可以搞定

If all you are doing is iterating through a list and downloading a list of URLs I would just use wget, no need for python here. In fact, you can do it in one line

awk '{print "wget -t2 -T5 --append-output=wget.log \"" $0 "\""}' listOfUrls | bash

这是通过一个 url 列表运行,并调用 wget,其中 wget 尝试下载文件两次,并在终止连接前等待 5 秒,它还将响应附加到 wget.log,您可以grep 最后查找 404 错误.

what this is doing is running through a list of urls, and calling wget, where wget tries to download the file twice, and waits 5 seconds before terminating the connection, it also appends the response to wget.log, which you can grep at the end looking for a 404 error.

这篇关于带有python时间限制的wget的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆