如何在 R 中发布一个简单的 HTML 表单? [英] How can I POST a simple HTML form in R?

查看:19
本文介绍了如何在 R 中发布一个简单的 HTML 表单?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 R 编程还比较陌生,我正在尝试将我在约翰霍普金斯大学数据科学课程中学到的一些东西付诸实践.具体来说,我想自动化从 美国财政部网站下载历史债券价格的过程

I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website

同时使用 Firefox 和 R,我能够确定美国财政部网站使用一个非常简单的 HTML POST 表单来指定感兴趣的报价的单个日期.然后返回所有未偿债券的二级市场信息表.

Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.

我曾尝试使用两个不同的 R 包向美国财政部 Web 服务器提交请求,但未成功.野兔是我尝试过的两种方法:

I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:

尝试 #1(使用 RCurl):

Attempt #1 (using RCurl):

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
                    submit = "Show Prices",
                    priceDate.year  = 2014,
                    priceDate.month = 12,
                    priceDate.day   = 15,
                   .opts = curlOptions(ssl.verifypeer = FALSE))

这会导致网页返回并存储在 td.html 中,但它包含的只是来自 treasurydirect 服务器的错误消息.我知道服务器正在工作,因为当我通过浏览器提交相同的请求时,我得到了预期的结果.

This results in a web page being returned and stored in td.html but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.

尝试 #2(使用 rvest):

Attempt #2 (using rvest):

s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)

不幸的是,这种方法甚至没有离开 R 并导致来自 R 的以下错误消息:

Unfortunately, this approach doesn't even leave R and results in the following error message from R:

Submitting with 'submit'
Error in function (type, msg, asError = TRUE)  : <url> malformed

我似乎不知道如何查看发送到 rvest 的格式错误"文本,以便我可以尝试诊断问题.

I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.

对于解决这个看似简单的任务的任何建议或技巧将不胜感激!

Any suggestions or tips to solving this seeming simple task would be greatly appreciated!

推荐答案

好吧,它似乎可以与 httr 库一起使用.

Well, it appears to work with the httr library.

library(httr)

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"

fd <- list(
    submit = "Show Prices",
    priceDate.year  = 2014,
    priceDate.month = 12,
    priceDate.day   = 15
)

resp<-POST(url, body=fd, encode="form")
content(resp)

rvest 库实际上只是 httr 的包装器.看起来它在解释没有服务器名称的绝对 URL 方面做得不好.所以如果你看

The rvest library is really just a wrapper to httr. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at

f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm

您会看到它只有路径而不是服务器名称.这似乎令人困惑 httr.如果你这样做

you see that it just has the path and not the server name. This appears to be confusing httr. If you do

f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)

这似乎有效.也许这是一个应该报告给 rvest 的错误.(在 rvest_0.1.0 上测试)

that seems to work. Perhaps it's a bug that should be reported to rvest. (Tested on rvest_0.1.0)

这篇关于如何在 R 中发布一个简单的 HTML 表单?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆