通过http获取目录列表 [英] Getting directory listing over http
问题描述
有一个目录通过网络提供,我有兴趣监控。它的内容是我正在使用的各种版本的软件,我想编写一个我可以运行的脚本来检查那里的内容,并下载任何比我已经获得的更新的内容。
There is a directory that is being served over the net which I'm interested in monitoring. Its contents are various versions of software that I'm using and I'd like to write a script that I could run which checks what's there, and downloads anything that is newer that what I've already got.
有没有办法,比如用 wget
或其他东西来获取目录列表。我已经尝试在目录上使用 wget
,这给了我html。为了避免解析html文档,有没有办法检索像 ls
这样的简单列表?
Is there a way, say with wget
or something, to get a a directory listing. I've tried using wget
on the directory, which gives me html. To avoid having to parse the html document, is there a way of retrieving a simple listing like ls
would give?
推荐答案
我只是找到了办法:
$ wget --spider -r --no-parent http://some.served.dir.ca/
这是相当的详细,所以你需要管理几次 grep
,具体取决于你所追求的内容,但信息就在那里。它看起来像打印到stderr,所以附加 2>& 1
来让 grep
。我贪图\ .tar\.gz找到网站提供的所有tar包。
It's quite verbose, so you need to pipe through grep
a couple of times depending on what you're after, but the information is all there. It looks like it prints to stderr, so append 2>&1
to let grep
at it. I grepped for "\.tar\.gz" to find all of the tarballs the site had to offer.
注意 wget
将临时文件写入工作目录,但不清除其临时目录。如果这是一个问题,您可以更改为临时目录:
Note that wget
writes temporary files in the working directory, and doesn't clean up its temporary directories. If this is a problem, you can change to a temporary directory:
$ (cd /tmp && wget --spider -r --no-parent http://some.served.dir.ca/)
这篇关于通过http获取目录列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!