R - 使用RCurl发布登录表单 [英] R - posting a login form using RCurl

查看:154
本文介绍了R - 使用RCurl发布登录表单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新手使用R发布表单然后从网上下载数据。我有一个问题可能很容易让那里的人发现我做错了什么,所以我感谢你的耐心等待。我有一台Win7 PC和Firefox 23.x是我的典型浏览器。



我试图发布显示在
上的主窗体

http://www.aplia.com/



我有以下R脚本:

  your.username<  - 'username'
your.password< - 'password'
setwd(C:/ Users / Desktop / Aplia / data)

require(SAScii)
require(RCurl)
require(XML)
agent =Firefox / 23.0

options(RCurlOptions = list(cainfo = system.file(CurlSSL,cacert.pem,package =RCurl)))
curl = getCurlHandle()
curlSetOpt(
cookiejar ='cookies.txt',
useragent = agent,
followlocation = TRUE,
autoreferer = TRUE,
curl = curl


#list传递给网站的参数(从源html中提取)
params< -
list(
'userAgent'= agent,
'screenWidth'=,
'screenHeight'=,
'flashMajor'=,
'flashMinor'=,
'flashBuild'=,
'flashPatch'=,
'redirect'= ,
'referrer'=http://www.aplia.com,
'txtEmail'= your.username,
'txtPassword'= your.password


#登录表格
html = postForm('https://courses.aplia.com/',.params = params,curl = curl)
html

#一旦表格发布就下载一个文件
html< -
getURL(
http://courses.aplia.com/af/servlet/mngstudents?ctx = filename,
curl = curl

html

但从那里我可以告诉我,我没有得到我想要的页面,因为返回到html的是重定向消息,似乎要求我再次登录(?):

 \\\\\ n< html> \\\\ n< head> \\\\ n< title> Aplia< / title> \\\\ t< script language = \JavaScript \type = \ text / javascript \> \\\\\\\ top.location.href = \https://courses.aplia.com/af/servlet/login?action=form& redirect =%2Fservlet%2Fmngstudents%3Fctx%3Dfilename \; \\\\\\\\\&< / script> \\\\ n< / head> \\\\ n<正文> \\\\ n点击< a href = \https://courses.aplia.com/af/servlet/login?action=form&redirect=%2Fservlet%2Fmngstudents%3Fctx%3Dfilename \ >此处< / A>继续。\\\\ n< / body> \\ n< / html> \\\\ n

虽然我确实认为在表单成功发布后会出现一系列重定向(手动,在浏览器中)。如何判断表单是否正确发布?



我很确定一旦我能够正确地使用该帖子,我就不会有问题指导R下载我需要的文件(我的500名学生中的每一个的在线活动报告但是花了几个小时研究这个并且卡住了。也许我需要在RCurl包中设置更多与cookie有关的选项(因为网站确实使用了cookie)---?



任何帮助都非常感谢!我通常使用R来处理统计数据,所以对这些包和函数来说是新手。

解决方案

答案最终非常简单。出于某种原因,我没有看到需要包含在 postForm

  html = postForm('https://courses.aplia.com/',。params = params, curl = curl,style =POST)

就是这样......


I am new to using R to post forms and then download data off the web. I have a question that is probably very easy for someone out there to spot what I am doing wrong, so I appreciate your patience. I have a Win7 PC and Firefox 23.x is my typical browser.

I am trying to post the main form that shows up on

http://www.aplia.com/

I have the following R script:

your.username <- 'username'
your.password <- 'password'
setwd( "C:/Users/Desktop/Aplia/data" )

require(SAScii) 
require(RCurl)
require(XML)
agent="Firefox/23.0" 

options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
curl = getCurlHandle()
curlSetOpt(
cookiejar = 'cookies.txt' ,
useragent = agent,
followlocation = TRUE ,
autoreferer = TRUE ,
curl = curl
)

# list parameters to pass to the website (pulled from the source html)
params <-
list(
'userAgent' = agent,
'screenWidth' = "",
'screenHeight' = "",
'flashMajor' = "",
'flashMinor' = "",
'flashBuild' = "",
'flashPatch' = "",
'redirect' = "",
'referrer' = "http://www.aplia.com",
'txtEmail' = your.username,
'txtPassword' = your.password 
    )

# logs into the form
html = postForm('https://courses.aplia.com/', .params = params, curl = curl)
html

# download a file once form is posted
html <-
getURL(
"http://courses.aplia.com/af/servlet/mngstudents?ctx=filename" ,
curl = curl
)
html

But from there I can tell that I am not getting the page I want, as what is returned into html is a redirect message that appears to be asking me to login again (?):

"\r\n\r\n<html>\r\n<head>\r\n    <title>Aplia</title>\r\n\t<script language=\"JavaScript\" type=\"text/javascript\">\r\n\r\n        top.location.href = \"https://courses.aplia.com/af/servlet/login?action=form&redirect=%2Fservlet%2Fmngstudents%3Fctx%3Dfilename\";\r\n    \r\n\t</script>\r\n</head>\r\n<body>\r\n    Click <a href=\"https://courses.aplia.com/af/servlet/login?action=form&redirect=%2Fservlet%2Fmngstudents%3Fctx%3Dfilename\">here</a> to continue.\r\n</body>\r\n</html>\r\n"

Although I do believe there are a series of redirects that occur once the form is posted successfully (manually, in a browser). How can I tell the form was posted correctly?

I am quite sure that once I can get the post working correctly, I won't have a problem directing R to download the files I need (online activity reports for each of my 500 students this semester). But spent several hours working on this and got stuck. Maybe I need to set more options with the RCurl package that have to do with cookies (as the site does use cookies) ---?

Any help so much appreciated!! I typically use R to handle statistical data so am new to these packages and functions.

解决方案

The answer ends up being very simple. For some reason, I didn't see one option that needs to be included in postForm:

html = postForm('https://courses.aplia.com/', .params = params, curl = curl, style="POST")

And that's it...

这篇关于R - 使用RCurl发布登录表单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆