wget循环,其中URL中的数字保持不变 [英] wget loop where numbers in URL stay the same

查看:149
本文介绍了wget循环,其中URL中的数字保持不变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Mac上的bash(版本3.2.57(1)-发行版)中下载一堆带有wget的PDF. PDF代表旧报纸文章,在1810年至1816年之间几乎每天都有出版.

I would like to download a bunch of PDF with wget in bash (version 3.2.57(1)-release) on a Mac. The PDF represent old newspaper article, which have been published almost every day between 1810 and 1816.

我尝试了以下命令:

for i in {10..16}; do wget -A pdf -nc -E -nd —no-check-certificate http://digital.slub-dresden.de/fileadmin/data/453041671-18$i0{1..9}0{1..9}/453041671-18$i0{1..9}0{1..9}_tif/jpegs/453041671-18$i0{1..9}0{1..9}.pdf http://digital.slub-dresden.de/fileadmin/data/453041671-18$i{10..12}{10..31}/453041671-18$i{10..12}{10..31}_tif/jpegs/453041671-18$i{10..12}{10..31}.pdf; done 

不幸的是,URL包含几个我需要迭代的数字,这使参数列表变得庞大,直到最终超过最大限制,例如e. g.

The unfortunate thing is that the URL contains several numbers I need to iterate which let the argument list grow huge until it eventually exceeds the max limit, e. g.

453041671-18 $ i0 {1..9} 0 {1..9}/453041671-18 $ i0 {1..9} 0 {1..9} _tif/jpegs/453041671-18 $ i0 { 1..9} 0 {1..9} .pdf

453041671-18$i0{1..9}0{1..9}/453041671-18$i0{1..9}0{1..9}_tif/jpegs/453041671-18$i0{1..9}0{1..9}.pdf

,我收到一条argument list too long错误消息.

and I receive an argument list too long error message.

如果以上述链接片段为例,则唯一存在的链接将是:

If you take the above link snippet as an example the only existing link would be:

453041671-18000701/453041671-18000701_tif/jpegs/453041671-18000701.pdf

453041671-18000701/453041671-18000701_tif/jpegs/453041671-18000701.pdf

其中所有月份的编号都相同(1800 07 01),与本示例不同:

where all month have the same number (18000701), unlike this example:

453041671-18000 8 01/453041671-18000 7 01_tif/jpegs/453041671-18000 7 01.pdf

453041671-18000801/453041671-18000701_tif/jpegs/453041671-18000701.pdf

或其他任何组合wget正在尝试.

or any other combination wget is trying.

我如何告诉wget分别在月份的每个迭代{1..9}{10..12}中设置所有数字相同?

How can I tell wget to set in each iteration of the month {1..9} and {10..12}, respectively, all numbers the same?

推荐答案

括号扩展不知道其他括号扩展.您不能同时具有多个大括号扩展名,也不能使其一并更改.而是必须使用for循环.

Brace expansions don't know about other brace expansions. You can't have multiple brace expansions and have them change in tandem. Instead, you must use a for loop.

for year in {10..16}; do
  for month in `seq -w 1 12`; do
    for day in `seq -w 1 31`; do
      wget ... 453041671-18$year$month$day/453041671-18$year$month${day}_tif/jpegs/453041671-18$year$month$day.pdf
      # The second day is in braces because otherwise it would parse as $day_tif.
    done
  done
done

如果要减少生成的wgets数量,可以将wget替换为echo ... >> listing,然后使用--input-file(-i)选项获取wget来从URL中提取URL.该文件.

In case you want to reduce the number of spawned wgets, you can replace wget with echo ... >> listing, and then use the --input-file (-i) option to get wget to pull URLs from that file.

这篇关于wget循环,其中URL中的数字保持不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆