使用可变长度的嵌套URL在R中下载多个文件 [英] Downloading multiple files in R with variable length, nested URLs

查看:83
本文介绍了使用可变长度的嵌套URL在R中下载多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是新成员。尝试从R中的网站下载大量文件(但也可以接受建议,例如wget。)

New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)

这篇文章,我知道我必须创建一个具有所需URL的向量。我最初的问题是编写此向量,因为我在每个州中有27个州和34个代理商。我必须为所有州的每个代理商下载一个文件。状态代码始终为两个字符,而代理商代码为2至7个字符长。 URL如下所示:

From this post, I understand I must create a vector with the desired URLs. My initial problem is to write this vector, since I have 27 states and 34 agencies within each state. I must download one file for each agency for all states. Whereas the state codes are always two characters, the agency codes are 2 to 7 characters long. The URLs would look like this:

http://website.gov/xx_yyyyyyy.zip

其中 xx 是州代码, yyyyyyy 代理商代码,长度介于2到7个字符之间。我不知道如何建立这样的循环。

where xxis the state code and yyyyyyy the agency code, between 2 and 7 characters long. I am lost as to how to build one such loop.

我假设我可以使用以下功能下载此网址列表:

I assume I can then download this url list with the following function:

for(i in 1:length(url)){
download.file(urls, destinations, mode="wb")}

这有意义吗?

(免责声明:较早此帖子的版本先前已上传,但不完整。我的错,对不起!)

(Disclaimer: an earlier version of this post was uploaded earlier but incomplete. My mistake, sorry!)

推荐答案

这将分批下载它们并利用如果安装的R中提供了 libcurl 选项,则 download.file()的同时下载功能的速度更快:

This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file() if the libcurl option is available on your installation of R:

library(purrr)

states <- state.abb[1:27]
agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
              "BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
              "CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
              "DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")

walk(states, function(x) {
   map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>% 
    flatten_chr() -> urls
    download.file(urls, basename(urls), method="libcurl")
}) 

这篇关于使用可变长度的嵌套URL在R中下载多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆