在download.file R中使用href和target? [英] Use href and target in download.file R?
问题描述
我有一段代码:
raw_prefix <- file.path("data", "raw")
fpa_prefix <- file.path(raw_prefix, "fpa-fod")
if(!dir.exists(fpa_prefix)){
dir.create(fpa_prefix)
}
fpa_gdb <- file.path(fpa_prefix, "RDS-2013-0009.4_GDB", "Data", "FPA_FOD_20170508.gdb")
if (!file.exists(fpa_gdb)) {
loc <- "https://www.fs.usda.gov/rds/fedora/objects/RDS:RDS-2013-0009.4/datastreams/RDS-2013-0009.4_GDB/content"
dest <- paste0(fpa_prefix, ".zip")
download.file(loc, dest)
unzip(dest, exdir = fpa_prefix)
unlink(dest)
assert_that(file.exists(fpa_gdb))
}
在大多数网站上,以可重现的工作流程的名义可以很好地下载文件,但是我需要一个数据集,其中包含 href和 target文件,这使得下载非常困难使用download.file()。
Which works great with most websites to download files on the fly in the name of reproducible workflows, but there is one dataset that I need which has an "href" and "target" file making it very difficult to download using download.file().
在以下位置(也在上面的代码中)找到文件:
The file is found (also in above code) here:
< a href = https://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.4/ rel = nofollow noreferrer> https://www.fs.usda.gov/rds/ archive / Product / RDS-2013-0009.4 /
页面底部是一个名为
Towards the bottom of the page is a file called
RDS-2013-0009.4_GDB.zip
RDS-2013-0009.4_GDB.zip
这是我尝试使用上述脚本下载的文件。
which is the file I am trying to download using the above script.
如果您检查此元素,则会发现此结构,它将返回正确的文件!但是如何转换为R代码?
If you inspect this element you will find this structure, which returns the correct file! But how to translate into R code?
<a href="//www.fs.usda.gov/rds/fedora/objects/RDS:RDS-2013-0009.4/datastreams/RDS-2013-0009.4_GDB/content" target="_blank">RDS-2013-0009.4_GDB.zip</a>
如果有人对如何下载此文件有任何想法文件,我将不胜感激!
If anyone has an idea on how to download this file I would GREATLY appreciate it!
谢谢!
推荐答案
此将:
- 找到页面上的所有.zip链接(URL和文件名)
- go通过找到的每个文件并像浏览器一样下载它们。
请注意, write_disk()
不会覆盖现有文件,因此,如果下载中断,请删除文件或使用 overwrite = TRUE
。
Note that write_disk()
won't overwrite existing files, so if downloads get interrupted, either delete the file or use overwrite=TRUE
.
library(rvest)
library(httr)
library(purrr)
pg <- read_html("https://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.4/")
fils <- html_nodes(pg, xpath=".//dd[@class='product']//li/a[contains(., 'zip')]")
walk2(html_attr(fils, 'href'), html_text(fils),
~GET(sprintf("https:%s", .x), write_disk(.y), progress()))
如果不想使用 purrr
,这都是基数R:
If you don't want to use purrr
, this is all base R:
invisible(
mapply(
download.file,
url = sprintf("https:%s", html_attr(fils, 'href')),
destfile = html_text(fils)
)
)
这篇关于在download.file R中使用href和target?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!