如何使用 R 从需要 cookie 的 SSL 页面下载压缩文件 [英] How to use R to download a zipped file from a SSL page that requires cookies

查看:25
本文介绍了如何使用 R 从需要 cookie 的 SSL 页面下载压缩文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从需要按下我同意"按钮并存储 cookie 的 https 页面下载文件.如果这个答案在某处很明显,我很抱歉..

当我直接在 Chrome 中打开网页并单击我同意"时 - 文件开始自动下载.

http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?path=SAMHDA&study=32722&bundle=delimited&ds=1&dups=yes>

我试图复制这个例子,但我不认为恒生网站实际上存储cookie/身份验证,所以我不知道那个例子是否应该是我所需要的.

除此之外,我相信 SSL 会使身份验证复杂化,因为我认为 getURL() 调用将需要证书规范,例如 cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

我是 RCurl 的初学者,不知道这个网站是不是很困难,或者我是否只是遗漏了一些明显的东西.

谢谢!

解决方案

使用 httr 更容易做到这一点,因为它设置了所有内容,以便 cookie 和 https 无缝工作.

生成 cookie 的最简单方法是让网站为您完成,手动发布我同意"表单生成的信息.然后,您再次请求下载实际文件.

库(httr)条款 <- "http://www.icpsr.umich.edu/cgi-bin/terms"下载 <- "http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2"values <- list(agree = "yes", path = "SAMHDA", study = "32722", ds = "",捆绑 =所有",dups =是")# 接受表格上的条款,# 生成合适的cookiesPOST(条款,正文=值)GET(下载,查询=值)# 实际下载文件(这需要一段时间)响应 <- GET(下载,查询 = 值)# 将下载的内容写入二进制文件writeBin(content(resp, "raw"), "c:/temp/thefile.zip")

I am trying to download a file from an https page that requires an "I Agree" button be pushed and then stores a cookie. My apologies if this answer is obvious somewhere..

When I open up the web page directly in Chrome and click "I Agree" - the file starts to download automatically.

http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?path=SAMHDA&study=32722&bundle=delimited&ds=1&dups=yes

I tried to replicate this example, but I don't think that hangseng website actually stores the cookie/authentication, so I don't know if that example should be all I need.

Beyond that, I believe the SSL complicates the authentication, since I think the getURL() call will require a certificate specification like cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

I'm too much of a beginner with RCurl to know if this website is pretty difficult or if I'm just missing something obvious.

Thank you!

解决方案

This is a bit easier to do with httr because it sets up everything so that cookies and https work seamlessly.

The easiest way to generate the cookies is to have the site do it for you, by manually posting the information that the "I agree" form generates. You then do a second request to download the actual file.

library(httr)
terms <- "http://www.icpsr.umich.edu/cgi-bin/terms"
download <- "http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2"

values <- list(agree = "yes", path = "SAMHDA", study = "32722", ds = "", 
  bundle = "all", dups = "yes")

# Accept the terms on the form, 
# generating the appropriate cookies
POST(terms, body = values)
GET(download, query = values)

# Actually download the file (this will take a while)
resp <- GET(download, query = values)

# write the content of the download to a binary file
writeBin(content(resp, "raw"), "c:/temp/thefile.zip")

这篇关于如何使用 R 从需要 cookie 的 SSL 页面下载压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆