R下载文件重定向错误 [英] R download file redirect error

查看:53
本文介绍了R下载文件重定向错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我正在尝试使用R通过ProPublica NonProfit Explorer API下载pdf文件: https://projects.propublica.org/nonprofits/api

Hi there, I am trying to use R to download pdf files via the ProPublica NonProfit Explorer API: https://projects.propublica.org/nonprofits/api

当我查询API时,它返回到pdf的链接.但是,这些链接重定向到AWS,例如 https://projects.propublica.org/nonprofits/download-filing?path = 2015_06_T%2F13-1624100_990T_201406.pdf

When I query the API, it returns links to pdfs. However these links redirect to AWS, e.g. https://projects.propublica.org/nonprofits/download-filing?path=2015_06_T%2F13-1624100_990T_201406.pdf

我已尝试按照本讨论中的建议指定 method ="curl",extra ='-L':

I have tried specifying method = "curl", extra='-L' as suggested in this discussion: R download file redirect. This returns status 127.

我也尝试过使用CRAN的下载程序"包.这会下载一个文件,但是当我尝试打开它时,它似乎在某种程度上已损坏,如Adobe所说的内存不足".

I have also tried using the "Downloader" package from CRAN. This downloads a file but it appears to be corrupted in some way as Adobe says "Out of memory" when I try to open it.

有人有什么建议吗?

推荐答案

只需使用 httr (您也应该将其用于API访问). write_disk()是您的bff:

Just use httr (which you should be using for the API access as well). write_disk() is your bff:

library(httr)

pp_doc_url <- "https://projects.propublica.org/nonprofits/download-filing?path=2015_06_T%2F13-1624100_990T_201406.pdf"

GET(
  url = pp_doc_url,
  write_disk("file.pdf"),
  verbose()
) -> res

以下是显示重定向的详细输出:

Here's the verbose output showing the redirect is followed:

## -> GET /nonprofits/download-filing?path=2015_06_T%2F13-1624100_990T_201406.pdf HTTP/1.1
## -> Host: projects.propublica.org
## -> User-Agent: libcurl/7.54.0 r-curl/3.0 httr/1.3.1
## -> Accept-Encoding: gzip, deflate
## -> Accept: application/json, text/xml, application/xml, */*
## -> 
## <- HTTP/1.1 302 Found
## <- Content-Type: text/html; charset=utf-8
## <- X-Frame-Options: SAMEORIGIN
## <- X-XSS-Protection: 1; mode=block
## <- X-Content-Type-Options: nosniff
## <- Location: https://pp-990.s3.amazonaws.com/2015_06_T/13-1624100_990T_201406.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI7C6X5GT42DHYZIA%2F20171202%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20171202T002756Z&X-Amz-Expires=1800&X-Amz-SignedHeaders=host&X-Amz-Signature=f90caae6a793239be8342d0ecbd96ff6f80b1821921cfadae00f78129a38a79f
## <- Cache-Control: max-age=0, private, must-revalidate
## <- Content-Encoding: gzip
## <- Transfer-Encoding: chunked
## <- Accept-Ranges: bytes
## <- Date: Sat, 02 Dec 2017 00:27:57 GMT
## <- Via: 1.1 varnish
## <- Connection: keep-alive
## <- X-Served-By: cache-bos8228-BOS
## <- X-Cache: MISS
## <- X-Cache-Hits: 0
## <- X-Timer: S1512174477.810292,VS0,VE194
## <- Vary: Accept,Accept-Encoding,Content-Type
## <- 
## -> GET /2015_06_T/13-1624100_990T_201406.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI7C6X5GT42DHYZIA%2F20171202%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20171202T002756Z&X-Amz-Expires=1800&X-Amz-SignedHeaders=host&X-Amz-Signature=f90caae6a793239be8342d0ecbd96ff6f80b1821921cfadae00f78129a38a79f HTTP/1.1
## -> Host: pp-990.s3.amazonaws.com
## -> User-Agent: libcurl/7.54.0 r-curl/3.0 httr/1.3.1
## -> Accept-Encoding: gzip, deflate
## -> Accept: application/json, text/xml, application/xml, */*
## -> 
## <- HTTP/1.1 200 OK
## <- x-amz-id-2: fycJGU5JQZ+o+aTOWFa86ZFyasv7XEH6RGsmXNo29+CtgDC8IZ438Ek61Bo/nUlRhk3fPKPXdMg=
## <- x-amz-request-id: AB2E8B3421A6B7BB
## <- Date: Sat, 02 Dec 2017 00:27:58 GMT
## <- Last-Modified: Thu, 13 Aug 2015 19:22:03 GMT
## <- ETag: "fd89377252531684bec1828db05c54e6"
## <- Cache-Control: no-cache, no-store
## <- Content-Language: en
## <- Accept-Ranges: bytes
## <- Content-Type: application/pdf
## <- Content-Length: 537542
## <- Server: AmazonS3
## <- 

以下是响应对象的内容:

Here's the contents of the response object:

res
## Response [https://pp-990.s3.amazonaws.com/2015_06_T/13-1624100_990T_201406.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI7C6X5GT42DHYZIA%2F20171202%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20171202T002756Z&X-Amz-Expires=1800&X-Amz-SignedHeaders=host&X-Amz-Signature=f90caae6a793239be8342d0ecbd96ff6f80b1821921cfadae00f78129a38a79f]
##   Date: 2017-12-02 00:27
##   Status: 200
##   Content-Type: application/pdf
##   Size: 538 kB
## <ON DISK>  file.pdf

并且,这是文件下载的证明:

And, here's proof the file was downloaded:

file.info("file.pdf")
##            size isdir mode               mtime               ctime               atime uid gid    uname grname
## file.pdf 537542 FALSE  644 2017-12-01 19:27:57 2017-12-01 19:27:57 2017-12-01 19:27:58 xxx  xx xxxxxxxx  xxxxx

在生产"中保留 verbose().

这篇关于R下载文件重定向错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆