如何通过 RCurl 使用 cookie? [英] How do I use cookies with RCurl?
问题描述
我正在尝试编写一个通过 REST API 访问一些数据的 R 包.但是,该 API 不使用 http 身份验证,而是依靠 cookie 来保存会话凭据.
I am trying to write an R package that accesses some data via a REST API. The API, however, doesn't use http authentication, but rather relies on cookies to keep credentials with the session.
基本上,我想用两个 R 函数替换 bash 脚本中的以下两行:一个用于执行登录,并存储会话 cookie,第二个用于获取数据.
Essentially, I'd like to replace the following two lines from a bash script with two R functions: One to perform the login, and store the session cookie, and the second to GET the data.
curl -X POST -c cookies.txt -d"username=xxx&password=yyy" http://api.my.url/login
curl -b cookies.txt http://api.my.url/data
我显然不明白 RCurl 如何处理 curl 选项.我的脚本目前有:
I'm clearly not understanding how RCurl works with curl options. My script as it stands has:
library(RCurl)
curl <- getCurlHandle()
curlSetOpt(cookiejar='cookies.txt', curl=curl)
postForm("http://api.my.url/login", username='xxx', password='yyy', curl=curl)
getURL('http://api.my.url/data", curl=curl)
最终的 postForm()
之后不存在 cookies.txt
文件.
The final getURL()
fails with a "Not logged in." message from the server, and after the postForm()
no cookies.txt
file exists.
推荐答案
一般情况下你不需要创建 cookie 文件,除非你想研究 cookie.
In general you don't need to create a cookie file, unless you want to study the cookies.
鉴于此,实际上,Web 服务器使用代理数据、重定向和隐藏的帖子数据,但这应该会有所帮助:
Given this, in real word, web servers use agent data, redirecting and hidden post data, but this should help:
library(RCurl)
#Set your browsing links
loginurl = "http://api.my.url/login"
dataurl = "http://api.my.url/data"
#Set user account data and agent
pars=list(
username="xxx"
password="yyy"
)
agent="Mozilla/5.0" #or whatever
#Set RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt", useragent = agent, followlocation = TRUE, curl=curl)
#Also if you do not need to read the cookies.
#curlSetOpt( cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)
#Post login form
html=postForm(loginurl, .params = pars, curl=curl)
#Go wherever you want
html=getURL(dataurl, curl=curl)
#Start parsing your page
matchref=gregexpr("... my regexp ...", html)
#... .... ...
#Clean up. This will also print the cookie file
rm(curl)
gc()
重要
除了用户名和密码之外,通常还有隐藏的帖子数据.要捕获它,您可能想要,例如在 Chrome 中,使用 Developer tools
(Ctrl Shift I) -> Network Tab
,以显示帖子字段名称和值.
Important
There can often be hidden post data, beyond username and password. To capture it you may want, e.g. in Chrome, to use Developer tools
(Ctrl Shift I) -> Network Tab
, in order to show the post field names and values.
这篇关于如何通过 RCurl 使用 cookie?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!