基于调试RCurl的身份验证和放大器;表单提交 [英] Debugging RCurl-based authentication & form submission

查看:232
本文介绍了基于调试RCurl的身份验证和放大器;表单提交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SourceForge的研究数据归档(SRDA)是为我的博士论文研究的数据来源之一。我在调试与SRDA数据收集了以下问题的难度。

SourceForge Research Data Archive (SRDA) is one of the data sources for my dissertation research. I'm having difficulty in debugging the following issue related to SRDA data collection.

从SRDA数据收集需要的验证的,然后的提交Web表单的SQL查询语句。经查询成功处理,系统会生成与查询结果 文本文件。在测试我的R code代表SRDA收集数据,我已经改变了SQL请求,以确保结果的文件正在被再生。然而,我发现,文件内容保持不变(对应于previous查询)。我认为,缺乏对文件内容的刷新可能是由于为验证查询表单提交失败的。以下是从code输出调试(的https://github.com/abnova/diss-floss/blob/master/import/getSourceForgeData.R):

Data collection from SRDA requires authentication and then submitting Web form with an SQL query. Upon successful processing of the query, the system generates a text file with query results. While testing my R code for SRDA data collection, I've changed the SQL request to make sure that the results file is being regenerated. However, I've discovered that the file contents stays the same (corresponds to previous query). I think that the lack of refresh of the file contents could be due to failure of either authentication, or query form submission. The following is the debug output from the code (https://github.com/abnova/diss-floss/blob/master/import/getSourceForgeData.R):

make importSourceForge

Rscript --no-save --no-restore --verbose getSourceForgeData.R
running
  '/usr/lib/R/bin/R --slave --no-restore --no-save --no-restore --file=getSourceForgeData.R'

Loading required package: RCurl
Loading required package: methods
Loading required package: bitops
Loading required package: digest

Retrieving SourceForge data...

Checking request "SELECT *
FROM sf1104.users a, sf1104.artifact b
WHERE a.user_id = b.submitted_by AND b.artifact_id = 304727"...
* About to connect() to zerlot.cse.nd.edu port 80 (#0)
*   Trying 129.74.152.47... * connected
> POST /mediawiki/index.php?title=Special:Userlogin&action=submitlogin&type=login HTTP/1.1
Host: zerlot.cse.nd.edu
Accept: */*
Content-Length: 37
Content-Type: application/x-www-form-urlencoded

* upload completely sent off: 37out of 37 bytes
< HTTP/1.1 200 OK
< Date: Tue, 11 Mar 2014 03:49:04 GMT
< Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.25 with Suhosin-Patch
< X-Powered-By: PHP/5.2.4-2ubuntu5.25
* Added cookie wiki_db_session="c61...a3c" for domain zerlot.cse.nd.edu, path /, expire 0
< Set-Cookie: wiki_db_session=c61...a3c; path=/
< Content-language: en
< Vary: Accept-Encoding,Cookie
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: private, must-revalidate, max-age=0
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host zerlot.cse.nd.edu left intact
[1] "Before second postForm()"
* Re-using existing connection! (#0) with host zerlot.cse.nd.edu
* Connected to zerlot.cse.nd.edu (129.74.152.47) port 80 (#0)
> POST /cgi-bin/form.pl HTTP/1.1
Host: zerlot.cse.nd.edu
Accept: */*
Cookie: wiki_db_session=c61...a3c
Content-Length: 129
Content-Type: application/x-www-form-urlencoded

* upload completely sent off: 129out of 129 bytes
< HTTP/1.1 500 Internal Server Error
< Date: Tue, 11 Mar 2014 03:49:04 GMT
< Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.25 with Suhosin-Patch
< Vary: Accept-Encoding
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Closing connection #0
Error: Internal Server Error
Execution halted
make: *** [importSourceForge] Error 1

我试图算出这个使用从Firefox嵌入式开发工具调试输出以及网络协议分析仪,但至今没有取得多大成功。将AP preciate任何的建议和帮助。

I've tried to figure this out using debug output as well as Network protocol analyzer from Firefox embedded Developer Tools, but so far without much success. Would appreciate any advice and help.

if (!require(RCurl)) install.packages('RCurl')
if (!require(digest)) install.packages('digest')

library(RCurl)
library(digest)

# Users must authenticate to access Query Form
SRDA_HOST_URL  <- "http://zerlot.cse.nd.edu"
SRDA_LOGIN_URL <- "/mediawiki/index.php?title=Special:Userlogin"
SRDA_LOGIN_REQ <- "&action=submitlogin&type=login"

# SRDA URL that Query Form sends POST requests to
SRDA_QUERY_URL <- "/cgi-bin/form.pl"

# SRDA URL that Query Form sends POST requests to
SRDA_QRESULT_URL <- "/qresult/blekh/blekh.txt"

# Parameters for result's format
DATA_SEP <- ":" # data separator
ADD_SQL  <- "1" # add SQL to file

curl <<- getCurlHandle()

srdaLogin <- function (loginURL, username, password) {

  curlSetOpt(curl = curl, cookiejar = 'cookies.txt',
             ssl.verifyhost = FALSE, ssl.verifypeer = FALSE,
             followlocation = TRUE, verbose = TRUE)

  params <- list('wpName1' = username, 'wpPassword1' = password)

  if(url.exists(loginURL)) {
    reply <- postForm(loginURL, .params = params, curl = curl,
                      style = "POST")
    #if (DEBUG) print(reply)
    info <- getCurlInfo(curl)
    return (ifelse(info$response.code == 200, TRUE, FALSE))
  }
  else {
    error("Can't access login URL!")
  }
}


srdaConvertRequest <- function (request) {

  return (list(select = "*",
               from = "sf1104.users a, sf1104.artifact b",
               where = "b.artifact_id = 304727"))
}


srdaRequestData <- function (requestURL, select, from, where, sep, sql) {

  params <- list('uitems' = select,
                 'utables' = from,
                 'uwhere' = where,
                 'useparator' = sep,
                 'append_query' = sql)

  if(url.exists(requestURL)) {
    reply <- postForm(requestURL, .params = params, #.opts = opts,
                      curl = curl, style = "POST")
  }
}


srdaGetData <- function(request) {

  resultsURL <- paste(SRDA_HOST_URL, SRDA_QRESULT_URL,
                      collapse="", sep="")

  results.query <- readLines(resultsURL, n = 1)

  return (ifelse(results.query == request, TRUE, FALSE))
}


getSourceForgeData <- function (request) {

  # Construct SRDA login and query URLs
  loginURL <- paste(SRDA_HOST_URL, SRDA_LOGIN_URL, SRDA_LOGIN_REQ,
                    collapse="", sep="")
  queryURL <- paste(SRDA_HOST_URL, SRDA_QUERY_URL, collapse="", sep="")

  # Log into the system 
  if (!srdaLogin(loginURL, USER, PASS))
    error("Login failed!")

  rq <- srdaConvertRequest(request)

  srdaRequestData(queryURL,
                  rq$select, rq$from, rq$where, DATA_SEP, ADD_SQL)

  if (!srdaGetData(request))
    error("Data collection failed!")
}


message("\nTesting SourceForge data collection...\n")

getSourceForgeData("SELECT * 
FROM sf1104.users a, sf1104.artifact b 
WHERE a.user_id = b.submitted_by AND b.artifact_id = 304727")

# clean up
close(curl)

更新2(无功能版):

if (!require(RCurl)) install.packages('RCurl')
library(RCurl)

# Users must authenticate to access Query Form
SRDA_HOST_URL  <- "http://zerlot.cse.nd.edu"
SRDA_LOGIN_URL <- "/mediawiki/index.php?title=Special:Userlogin"
SRDA_LOGIN_REQ <- "&action=submitlogin&type=login"

# SRDA URL that Query Form sends POST requests to
SRDA_QUERY_URL <- "/cgi-bin/form.pl"

# SRDA URL that Query Form sends POST requests to
SRDA_QRESULT_URL <- "/qresult/blekh/blekh.txt"

# Parameters for result's format
DATA_SEP <- ":" # data separator
ADD_SQL  <- "1" # add SQL to file


message("\nTesting SourceForge data collection...\n")

curl <- getCurlHandle()

curlSetOpt(curl = curl, cookiejar = 'cookies.txt',
           ssl.verifyhost = FALSE, ssl.verifypeer = FALSE,
           followlocation = TRUE, verbose = TRUE)

# === Authentication ===

loginParams <- list('wpName1' = USER, 'wpPassword1' = PASS)

loginURL <- paste(SRDA_HOST_URL, SRDA_LOGIN_URL, SRDA_LOGIN_REQ,
                  collapse="", sep="")

if (url.exists(loginURL)) {
  postForm(loginURL, .params = loginParams, curl = curl, style = "POST")
  info <- getCurlInfo(curl)
  message("\nLogin results - HTTP status code: ", info$response.code, "\n\n")
} else {
  error("\nCan't access login URL!\n\n")
}

# === Data collection ===

# Previous query was: "SELECT * FROM sf0305.users WHERE user_id < 100"
query <- list(select = "*",
              from = "sf1104.users a, sf1104.artifact b",
              where = "b.artifact_id = 304727") 

getDataParams <- list('uitems'       = query$select,
                      'utables'      = query$from,
                      'uwhere'       = query$where,
                      'useparator'   = DATA_SEP,
                      'append_query' = ADD_SQL)

queryURL <- paste(SRDA_HOST_URL, SRDA_QUERY_URL, collapse="", sep="")

if(url.exists(queryURL)) {
  postForm(queryURL, .params = getDataParams, curl = curl, style = "POST")
  resultsURL <- paste(SRDA_HOST_URL, SRDA_QRESULT_URL,
                      collapse="", sep="")
  results.query <- readLines(resultsURL, n = 1)
  request <- paste(query$select, query$from, query$where)
  if (results.query == request)
    message("\nData request is successful, SQL query: ", request, "\n\n")
  else
    message("\nData request failed, SQL query: ", request, "\n\n")
} else {
  error("\nCan't access data query URL!\n\n")
}

close(curl)

更新3(服务器端调试)

最后,我能够取得联系与负责该系统的人,他帮我把问题缩小为 cookie管理恕我直言。这里的的错误日志记录的,对应运行我的code:

UPDATE 3 (server-side debugging)

Finally, I was able to get in touch with a person responsible for the system and he helped me to narrow down the issue to cookie management IMHO. Here's the error log record, corresponding to running my code:

[周五3月21日15时33分十四秒2014年] [错误] [客户54.204.180.203] [周五03月21日
  15时33分14秒2014年] form.pl:/ tmp目录/ sess_3e55593e436a013597cd320e4c6a2fac:
  在/var/www/cgi-bin/form.pl线43

[Fri Mar 21 15:33:14 2014] [error] [client 54.204.180.203] [Fri Mar 21 15:33:14 2014] form.pl: /tmp/sess_3e55593e436a013597cd320e4c6a2fac: at /var/www/cgi-bin/form.pl line 43

以下是服务器端脚本的代码片段生成该错误(1号线在脚本(的Perl 庆典间preTER指令,所以报行号43是最有可能的行号44):

The following is the snippet of the server-side script (Perl) that generated that error (line #1 in the script is bash interpreter directive, so reported line number 43 is most likely line number 44):

42     if (-e "/tmp/sess_$file") {
43     $session = PHP::Session->new($cgi->cookie("$session_name"));
44     $user_id = $session->get('wsUserID');
45     $user_name = $session->get('wsUserName');

下面是一个会话信息(1)的认证之后的和(2)的提交数据请求后的,通过跟踪获得的手动认证和手动数据请求表单提交:

The following is a session information (1) after authentication and (2) after submitting data request, obtained by tracing manual authentication and manual data request form submission:

(1)wiki_dbUserID = 449;过期=太阳,20月 - 2014年21点04分14秒格林尼治标准​​时间;
  PATH = / = wiki_dbUserName Blekh;到期=孙,20-APR-2014 21点04分14秒格林尼治标准​​时间;
  PATH = / = wiki_dbToken删除;过期=星期四,3月21日2013年21点04分13秒格林尼治标准​​时间

(1) "wiki_dbUserID=449; expires=Sun, 20-Apr-2014 21:04:14 GMT; path=/wiki_dbUserName=Blekh; expires=Sun, 20-Apr-2014 21:04:14 GMT; path=/wiki_dbToken=deleted; expires=Thu, 21-Mar-2013 21:04:13 GMT"

(2)wiki_db_session = aaed058f97059174a59effe44b137cbc;
  _ga = GA1.2.2065853334.1395410153; EDSSID = e24ff5ed891c28c61f2d1f8dec424274; wiki_dbUserName = Blekh;
  wiki_dbLoggedOut = 20140321210314; wiki_dbUserID = 449

(2) wiki_db_session=aaed058f97059174a59effe44b137cbc; _ga=GA1.2.2065853334.1395410153; EDSSID=e24ff5ed891c28c61f2d1f8dec424274; wiki_dbUserName=Blekh; wiki_dbLoggedOut=20140321210314; wiki_dbUserID=449

请问AP preciate任何帮助与我的code搞清楚的问题!

Would appreciate any help in figuring out the problem with my code!

推荐答案

我已经简化了code更进一步的:

I've simplified the code still further:

library(httr)

base_url  <- "http://srda.cse.nd.edu"

loginURL <- modify_url(
  base_url, 
  path = "mediawiki/index.php", 
  query = list(
    title = "Special:Userlogin", 
    action = "submitlogin",
    type = "login",
    wpName1 = USER,
    wpPasswor1 = PASS
  )
)
r <- POST(loginURL)
stop_for_status(r)

queryURL <- modify_url(base_url, path = "cgi-bin/form.pl")
query <- list(
  uitems       = "user_name",
  utables      = "sf1104.users a, sf1104.artifact b",
  uwhere       = "a.user_id = b.submitted_by AND b.artifact_id = 304727",
  useparator   = ":",
  append_query = "1"
)
r <- POST(queryURL, body = query, multipart = FALSE)
stop_for_status(r)

但我仍然得到一个500。我想:

But I'm still getting a 500. I tried:

这篇关于基于调试RCurl的身份验证和放大器;表单提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆