如何在R中使用httr对shibboleth多主机名网站进行身份验证 [英] how to authenticate a shibboleth multi-hostname website with httr in R

查看:94
本文介绍了如何在R中使用httr对shibboleth多主机名网站进行身份验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:ipums international和ipums usa可能使用相同的系统. ipums美国可以加快注册速度.如果您想测试您的代码,请尝试 https://usa.ipums.org /usa-action/users/request_access 进行注册!

note: ipums international and ipums usa probably use the same system. ipums usa allows quicker signup. if you would like to test out your code, try https://usa.ipums.org/usa-action/users/request_access to sign up!

我正在尝试以编程方式从 https://international.ipums.org/下载文件R语言和httr.我需要使用httr而不是RCurl,因为我需要将身份验证后的大型文件 not 下载到RAM中,然后直接下载到磁盘中. 目前仅在据我所知

i am trying to programmatically download a file from https://international.ipums.org/ with the R language and httr. i need to use httr and not RCurl because i need to post-authentication download large files not into RAM but directly to disk. this is currently only possible with httr as far as i know

下面的可复制代码记录了我从登录页面获得的最大努力( https: //international.ipums.org/international-action/users/login )访问身份验证后的主页面.任何提示或提示将不胜感激!谢谢!

the reproducible code below documents my best effort at getting from the login page (https://international.ipums.org/international-action/users/login) to the main post-authentication page. any tips or hints would be appreciated! thanks!

my_email <- "email@address.com"
my_password <- "password"

tf <- tempfile()

# use httr, because i need to download a large file after authentication
# and only httr supports that with its `write_disk()` option
library(httr)

# turn off ssl verify, otherwise the subsequent GET command will fail
set_config( config( ssl_verifypeer = 0L ) )

GET( "https://international.ipums.org/Shibboleth.sso/Login?target=https%3A%2F%2Finternational.ipums.org%2Finternational-action%2Fmenu" )

# connect to the starting login page of the website
( a <- GET( "https://international.ipums.org/international-action/users/login" , verbose( info = TRUE ) ) )

# which takes me through to a lot of websites, but ultimately (in my browser) lands at
shibboleth_url <- "https://live.identity.popdata.org:443/idp/Authn/UserPassword"

# construct authentication information?
base_values <- list( "j_username" = my_email , "j_password" = my_password )
idp_values <- list( "j_username" = my_email , "j_password" = my_password ,  "_idp_authn_lc_key"=subset( a$cookies , domain == "live.identity.popdata.org" )$value , "JSESSIONID" = subset( a$cookies , domain == "#HttpOnly_live.identity.popdata.org" )$value )
ipums_values <- list( "j_username" = my_email , "j_password" = my_password ,  "_idp_authn_lc_key"=subset( a$cookies , domain == "live.identity.popdata.org" )$value , "JSESSIONID" = subset( a$cookies , domain == "international.ipums.org" )$value)

# i believe this is where the main login should happen, but it looks like it's failing
GET( shibboleth_url , query = idp_values )
POST( shibboleth_url , body = base_values )
writeBin( GET( shibboleth_url , query = idp_values )$content , tf )

readLines( tf )
# The MPC account authentication system has encountered an error
# This error can sometimes occur if you did not close your browser after logging out of an application previously.  It may also occur for other reasons.  Please close your browser and try your action again."                                                                      

writeBin( GET( "https://live.identity.popdata.org/idp/profile/SAML2/Redirect/SSO" , query = idp_values )$content , tf )
POST( "https://live.identity.popdata.org/idp/profile/SAML2/Redirect/SSO" , body = idp_values )
readLines( tf )
# same error as above

# return to the main login page..
writeBin( GET( "https://international.ipums.org/international-action/menu" , query = ipums_values )$content , tf )
readLines( tf )
# ..not logged in

推荐答案

您必须使用set_cookies()将cookie发送到服务器:

You have to use set_cookies() to send your cookies to the server:

library(httr)
library(rvest)
#my_email <- "xxx"
#my_password <- "yyy"
tf <- tempfile()
set_config( config( ssl_verifypeer = 0L ) )

# Get first page
p1 <- GET( "https://international.ipums.org/international-action/users/login" , verbose( info = TRUE ) )

# Post Login credentials
b2 <- list( "j_username" = my_email , "j_password" = my_password )
c2 <- c(JSESSIONID=p1$cookies[p1$cookies$domain=="#HttpOnly_live.identity.popdata.org",]$value,
           `_idp_authn_lc_key`=p1$cookies[p1$cookies$domain=="live.identity.popdata.org",]$value)
p2 <- POST(p1$url,body = b2, set_cookies(.cookies = c2), encode="form" )

# Parse hidden fields
h2 <- read_html(p2$content)
form <-  h2 %>% html_form() 

# Post hidden fields
b3 <- list( "RelayState"=form[[1]]$fields[[1]]$value, "SAMLResponse"=form[[1]]$fields[[2]]$value)
c3 <- c(JSESSIONID=p1$cookies[p1$cookies$domain=="#HttpOnly_live.identity.popdata.org",]$value,
           `_idp_session`=p2$cookies[p2$cookies$name=="_idp_session",]$value,
           `_idp_authn_lc_key`=p2$cookies[p2$cookies$name=="_idp_authn_lc_key",]$value)
p3 <- POST( form[[1]]$url , body=b3, set_cookies(.cookies = c3), encode = "form")

# Get interesting page
c4 <- c(JSESSIONID=p3$cookies[p1$cookies$domain=="international.ipums.org" && p3$cookies$name=="JSESSIONID",]$value,
           `_idp_session`=p3$cookies[p3$cookies$name=="_idp_session",]$value,
           `_idp_authn_lc_key`=p3$cookies[p3$cookies$name=="_idp_authn_lc_key",]$value)
p4 <- GET( "https://international.ipums.org/international-action/menu", set_cookies(.cookies = c4) )
writeBin(p4$content , tf )
readLines( tf )[55]

因为结果是

[1] "    <li class=\"lastItem\"><a href=\"/international-action/users/logout\">Logout</a></li>"

我认为您已经登录...

I think you're logged in...

这篇关于如何在R中使用httr对shibboleth多主机名网站进行身份验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆