wget循环,其中URL中的数字保持不变 [英] wget loop where numbers in URL stay the same
问题描述
我想在Mac上的bash
(版本3.2.57(1)-发行版)中下载一堆带有wget
的PDF. PDF代表旧报纸文章,在1810年至1816年之间几乎每天都有出版.
I would like to download a bunch of PDF with wget
in bash
(version 3.2.57(1)-release) on a Mac. The PDF represent old newspaper article, which have been published almost every day between 1810 and 1816.
我尝试了以下命令:
for i in {10..16}; do wget -A pdf -nc -E -nd —no-check-certificate http://digital.slub-dresden.de/fileadmin/data/453041671-18$i0{1..9}0{1..9}/453041671-18$i0{1..9}0{1..9}_tif/jpegs/453041671-18$i0{1..9}0{1..9}.pdf http://digital.slub-dresden.de/fileadmin/data/453041671-18$i{10..12}{10..31}/453041671-18$i{10..12}{10..31}_tif/jpegs/453041671-18$i{10..12}{10..31}.pdf; done
不幸的是,URL包含几个我需要迭代的数字,这使参数列表变得庞大,直到最终超过最大限制,例如e. g.
The unfortunate thing is that the URL contains several numbers I need to iterate which let the argument list grow huge until it eventually exceeds the max limit, e. g.
453041671-18 $ i0 {1..9} 0 {1..9}/453041671-18 $ i0 {1..9} 0 {1..9} _tif/jpegs/453041671-18 $ i0 { 1..9} 0 {1..9} .pdf
453041671-18$i0{1..9}0{1..9}/453041671-18$i0{1..9}0{1..9}_tif/jpegs/453041671-18$i0{1..9}0{1..9}.pdf
,我收到一条argument list too long
错误消息.
and I receive an argument list too long
error message.
如果以上述链接片段为例,则唯一存在的链接将是:
If you take the above link snippet as an example the only existing link would be:
453041671-18000701/453041671-18000701_tif/jpegs/453041671-18000701.pdf
453041671-18000701/453041671-18000701_tif/jpegs/453041671-18000701.pdf
其中所有月份的编号都相同(1800 07 01),与本示例不同:
where all month have the same number (18000701), unlike this example:
453041671-18000 8 01/453041671-18000 7 01_tif/jpegs/453041671-18000 7 01.pdf
453041671-18000801/453041671-18000701_tif/jpegs/453041671-18000701.pdf
或其他任何组合wget
正在尝试.
or any other combination wget
is trying.
我如何告诉wget
分别在月份的每个迭代{1..9}
和{10..12}
中设置所有数字相同?
How can I tell wget
to set in each iteration of the month {1..9}
and {10..12}
, respectively, all numbers the same?
推荐答案
括号扩展不知道其他括号扩展.您不能同时具有多个大括号扩展名,也不能使其一并更改.而是必须使用for
循环.
Brace expansions don't know about other brace expansions. You can't have multiple brace expansions and have them change in tandem. Instead, you must use a for
loop.
for year in {10..16}; do
for month in `seq -w 1 12`; do
for day in `seq -w 1 31`; do
wget ... 453041671-18$year$month$day/453041671-18$year$month${day}_tif/jpegs/453041671-18$year$month$day.pdf
# The second day is in braces because otherwise it would parse as $day_tif.
done
done
done
如果要减少生成的wgets
数量,可以将wget
替换为echo ... >> listing
,然后使用--input-file
(-i
)选项获取wget
来从URL中提取URL.该文件.
In case you want to reduce the number of spawned wgets
, you can replace wget
with echo ... >> listing
, and then use the --input-file
(-i
) option to get wget
to pull URLs from that file.
这篇关于wget循环,其中URL中的数字保持不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!