丢弃小于最小文件大小的curl下载 [英] Discard curl downloads below a minimum file size
问题描述
我经常需要遍历通过 curl
进行的某些图像下载,并希望丢弃那些小于特定文件大小的下载,因为它们显然不存在.e.返回的图像"是404页. curl
具有一个-max-filesize
选项,但没有一个用于最小文件大小.
I regularly need to iterate through some image downloads via curl
and want to discard those downloads that are below a certain file size, because they obviously don't exist, i. e. the returned "image" is a 404 page. curl
has a --max-filesize
option, but not one for minimum file sizes.
我正在考虑将URL图像位置和最小文件大小传递给脚本 curlmin
的地方,例如
I was thinking of something where I would pass the URL image location and the minimum file size to the script curlmin
, such as
curlmin网址minfilesize
curlmin url minfilesize
我相信,由于我使用的字母数字序列使用[]进行迭代,因此我需要能够即时"删除小于特定文件大小的那些文件.我在下面的尝试还没有完成,因为我一直在坚持如何在"if"语句中引用当前"文件以及如何实际删除该文件.但这也可能是我的一个班轮也不是一开始就可以工作的.
I believe that since I'm using sequences of alphanumeric series using [ ] to iterate through days, I need to be able to delete those files below a certain file size "on the fly". My attempt below is anything but complete, because I'm stuck on how to reference the "current" file in the 'if' statement and how to actually delete the file. But it might as well be that my one liner wouldn't work in first place either.
#!/bin/bash
curl -O $1 | if [ $(wc -c <"$1") -le $2 ]; then delete_file_here; fi
^^ ^^^^^^^^^^^^^^^^
有什么主意吗?谢谢.
推荐答案
如果只希望 curl
避免在不存在远程文件或请求错误时创建本地文件/下载它只需使用 -f
标志:
If you just want curl
to avoid creating a local file when the remote one doesn't exist or there was an error in requesting/downloading it just use the -f
flag:
curl -fO "$1"
但是,如果您要删除特定大小以下的文件,则有以下两种选择:
If, however, you want to remove files below a certain size, here are a couple of options:
您可以使用参数扩展来获取文件名删除直到最后一个/
的所有内容,例如 $ {1 ## */}
You can get the filename by using parameter expansion to get rid of everything up to the last /
like ${1##*/}
所以你可以做
curl -O "$1"
if [[ $(wc -c < "${1##*/}") -le $2 ]]; then
rm -f "${1##*/}"
fi
或者我们可以使用 find
进行检查:
or we could check using find
:
curl -O "$1"
find . -type f -name "${1##*/}" -size -"$2"c -delete
假设您的 find
支持 -delete
.否则,您可以将其替换为 -exec rm -f {} +
assuming your find
supports -delete
. Otherwise you could replace that with -exec rm -f {} +
,如果您的 curl
命令可能下载了多个文件,则可以轻松地使用 find
命令来查找目录结构中小于给定大小的所有文件.
and if your curl
command might download multiple files you can easily adapt the find
command to find all files in a directory structure smaller than the given size.
最安全的路径是使用 find
,因为它不会被奇怪的文件名绊倒.失败的话,您可以尝试使用另一种更好地支持这些操作的语言.如果您确信只有安全的名字,则可以尝试以下操作:
The safest path is to use find
since it won't be tripped up by strange filenames. Failing that, you could try using another language that is better able to support these operations. If you are confident that you have only safe names you could try the following:
curl -O "$1" 2>&1 | awk -v min_size="$2" '/-->/ {"stat -c%s " $NF | getline s; if(s < min_size) {system("rm " $NF);}}'
,它将把 $ 2
作为变量 min_size
传递到 awk
中.然后,我们查看包含->
的 curl
输出的每一行.这些行的最后一个字段(再次使用安全名称)是本地文件名,因此我们将在其上调用 stat
以获取大小( -c%s
)并然后检查该大小是否低于我们的 min_size
.如果是这样,请在其上调用 rm
,再次相信我们拥有安全的名称,而不是那些将包含 IFS
或通配字符等的名称.
which will pass $2
into awk
as the variable min_size
. Then we'll look at each line of curl
output that contains -->
. The last field of those lines (assuming safe names again) is the local filename, so we'll call stat
on that to get the size (-c %s
) and then check if that size is below our min_size
. If it is, call rm
on it, once again trusting that we have safe names and not ones that will contain IFS
or globbing characters or the like.
这篇关于丢弃小于最小文件大小的curl下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!