在download.file R中使用href和target？ [英] Use href and target in download.file R?

查看：102 发布时间：2020/10/26 1:32:03 r download

本文介绍了在download.file R中使用href和target？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一段代码：

raw_prefix <- file.path("data", "raw")

fpa_prefix <- file.path(raw_prefix, "fpa-fod")

if(!dir.exists(fpa_prefix)){
  dir.create(fpa_prefix)
}

fpa_gdb <- file.path(fpa_prefix, "RDS-2013-0009.4_GDB", "Data", "FPA_FOD_20170508.gdb")

if (!file.exists(fpa_gdb)) {
  loc <- "https://www.fs.usda.gov/rds/fedora/objects/RDS:RDS-2013-0009.4/datastreams/RDS-2013-0009.4_GDB/content"
  dest <- paste0(fpa_prefix, ".zip")
  download.file(loc, dest)
  unzip(dest, exdir = fpa_prefix)
  unlink(dest)
  assert_that(file.exists(fpa_gdb))
}

在大多数网站上，以可重现的工作流程的名义可以很好地下载文件，但是我需要一个数据集，其中包含 href和 target文件，这使得下载非常困难使用download.file（）。

Which works great with most websites to download files on the fly in the name of reproducible workflows, but there is one dataset that I need which has an "href" and "target" file making it very difficult to download using download.file().

在以下位置（也在上面的代码中）找到文件：

The file is found (also in above code) here:

< a href = https://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.4/ rel = nofollow noreferrer> https://www.fs.usda.gov/rds/ archive / Product / RDS-2013-0009.4 /

页面底部是一个名为

Towards the bottom of the page is a file called

RDS-2013-0009.4_GDB.zip

这是我尝试使用上述脚本下载的文件。

which is the file I am trying to download using the above script.

如果您检查此元素，则会发现此结构，它将返回正确的文件！但是如何转换为R代码？

If you inspect this element you will find this structure, which returns the correct file! But how to translate into R code?

<a href="//www.fs.usda.gov/rds/fedora/objects/RDS:RDS-2013-0009.4/datastreams/RDS-2013-0009.4_GDB/content" target="_blank">RDS-2013-0009.4_GDB.zip</a>

如果有人对如何下载此文件有任何想法文件，我将不胜感激！

If anyone has an idea on how to download this file I would GREATLY appreciate it!

谢谢！

推荐答案

此将：

找到页面上的所有.zip链接（URL和文件名）

go通过找到的每个文件并像浏览器一样下载它们。

请注意， write_disk（）不会覆盖现有文件，因此，如果下载中断，请删除文件或使用 overwrite = TRUE 。

Note that write_disk() won't overwrite existing files, so if downloads get interrupted, either delete the file or use overwrite=TRUE.

library(rvest)
library(httr)
library(purrr)

pg <- read_html("https://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.4/")

fils <- html_nodes(pg, xpath=".//dd[@class='product']//li/a[contains(., 'zip')]") 

walk2(html_attr(fils, 'href'),  html_text(fils), 
      ~GET(sprintf("https:%s", .x), write_disk(.y), progress()))

如果不想使用 purrr ，这都是基数R：

If you don't want to use purrr, this is all base R:

invisible(
  mapply(
    download.file, 
       url = sprintf("https:%s", html_attr(fils, 'href')),
       destfile = html_text(fils)
  )
)

这篇关于在download.file R中使用href和target？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在download.file R中使用href和target？ [英] Use href and target in download.file R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在download.file R中使用href和target？ [英] Use href and target in download.file R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭