丢弃小于最小文件大小的curl下载 [英] Discard curl downloads below a minimum file size

查看:75
本文介绍了丢弃小于最小文件大小的curl下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常需要遍历通过 curl 进行的某些图像下载,并希望丢弃那些小于特定文件大小的下载,因为它们显然不存在.e.返回的图像"是404页. curl 具有一个-max-filesize 选项,但没有一个用于最小文件大小.

I regularly need to iterate through some image downloads via curl and want to discard those downloads that are below a certain file size, because they obviously don't exist, i. e. the returned "image" is a 404 page. curl has a --max-filesize option, but not one for minimum file sizes.

我正在考虑将URL图像位置和最小文件大小传递给脚本 curlmin 的地方,例如

I was thinking of something where I would pass the URL image location and the minimum file size to the script curlmin, such as

curlmin网址minfilesize

curlmin url minfilesize

我相信,由于我使用的字母数字序列使用[]进行迭代,因此我需要能够即时"删除小于特定文件大小的那些文件.我在下面的尝试还没有完成,因为我一直在坚持如何在"if"语句中引用当前"文件以及如何实际删除该文件.但这也可能是我的一个班轮也不是一开始就可以工作的.

I believe that since I'm using sequences of alphanumeric series using [ ] to iterate through days, I need to be able to delete those files below a certain file size "on the fly". My attempt below is anything but complete, because I'm stuck on how to reference the "current" file in the 'if' statement and how to actually delete the file. But it might as well be that my one liner wouldn't work in first place either.

#!/bin/bash
curl -O $1 | if [ $(wc -c <"$1") -le $2 ]; then delete_file_here; fi
                            ^^                  ^^^^^^^^^^^^^^^^

有什么主意吗?谢谢.

推荐答案

如果只希望 curl 避免在不存在远程文件或请求错误时创建本地文件/下载它只需使用 -f 标志:

If you just want curl to avoid creating a local file when the remote one doesn't exist or there was an error in requesting/downloading it just use the -f flag:

curl -fO "$1"

但是,如果您要删除特定大小以下的文件,则有以下两种选择:

If, however, you want to remove files below a certain size, here are a couple of options:

您可以使用参数扩展来获取文件名删除直到最后一个/的所有内容,例如 $ {1 ## */}

You can get the filename by using parameter expansion to get rid of everything up to the last / like ${1##*/}

所以你可以做

curl -O "$1"
if [[ $(wc -c < "${1##*/}") -le $2 ]]; then
    rm -f "${1##*/}"
fi

或者我们可以使用 find 进行检查:

or we could check using find:

curl -O "$1"
find . -type f -name "${1##*/}" -size -"$2"c -delete

假设您的 find 支持 -delete .否则,您可以将其替换为 -exec rm -f {} +

assuming your find supports -delete. Otherwise you could replace that with -exec rm -f {} +

,如果您的 curl 命令可能下载了多个文件,则可以轻松地使用 find 命令来查找目录结构中小于给定大小的所有文件.

and if your curl command might download multiple files you can easily adapt the find command to find all files in a directory structure smaller than the given size.

最安全的路径是使用 find ,因为它不会被奇怪的文件名绊倒.失败的话,您可以尝试使用另一种更好地支持这些操作的语言.如果您确信只有安全的名字,则可以尝试以下操作:

The safest path is to use find since it won't be tripped up by strange filenames. Failing that, you could try using another language that is better able to support these operations. If you are confident that you have only safe names you could try the following:

curl -O "$1" 2>&1 | awk -v min_size="$2" '/-->/ {"stat -c%s " $NF | getline s; if(s < min_size) {system("rm " $NF);}}'

,它将把 $ 2 作为变量 min_size 传递到 awk 中.然后,我们查看包含-> curl 输出的每一行.这些行的最后一个字段(再次使用安全名称)是本地文件名,因此我们将在其上调用 stat 以获取大小( -c%s )并然后检查该大小是否低于我们的 min_size .如果是这样,请在其上调用 rm ,再次相信我们拥有安全的名称,而不是那些将包含 IFS 或通配字符等的名称.

which will pass $2 into awk as the variable min_size. Then we'll look at each line of curl output that contains -->. The last field of those lines (assuming safe names again) is the local filename, so we'll call stat on that to get the size (-c %s) and then check if that size is below our min_size. If it is, call rm on it, once again trusting that we have safe names and not ones that will contain IFS or globbing characters or the like.

这篇关于丢弃小于最小文件大小的curl下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆