如何使用R在半断的javascript asp函数后面下载文件 [英] How to download a file behind a semi-broken javascript asp function with R

查看:57
本文介绍了如何使用R在半断的javascript asp函数后面下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试修复

解决方案

使用出色的 curlconverter 进行模仿在浏览器中,您可以直接请求pdf.

首先,我们模仿浏览器的初始 GET 请求(可能不需要简单的GET并保留cookie就足够了):

 库(curlconverter)图书馆(httr)browserGET<-"curl'http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp'-H'主机:www.worldvaluessurvey.org'-H'用户代理:Mozilla/5.0(X11; Ubuntu; Linux x86_64;rv:49.0)Gecko/20100101 Firefox/49.0'-H'接受:text/html,application/xhtml + xml,application/xml; q = 0.9,*/*; q = 0.8'-H'接受语言:zh-US,en; q = 0.5'--compressed -H'连接:保持活动状态'-H'升级-不安全请求:1'getDATA<-(矫正(browserGET)%>%make_req)[[1]]() 

JSESSIONID cookie可从 getDATA $ cookies $ value

获得.

  getPDF<-"curl'http://www.worldvaluessurvey.org/wvsdc/DC00012/F00001316-WVS_2000_Questionnaire_Root.pdf'-H'Accept:text/html,application/xhtml + xml,application/xml; q = 0.9,*/*; q = 0.8'-H'接受编码:gzip,deflate'-H'接受语言:zh-cn,en; q = 0.5'-H'连接:keep-alive'-H'Cookie:JSESSIONID = 59558DE631D107B61F528C952FC6E21F'-H'主机:www.worldvaluessurvey.org'-H'参考网址:http://www.worldvaluessurvey.org/AJDocumentationSmpl.jsp'-H'升级-不安全的请求:1'-H'用户代理:Mozilla/5.0(Windows NT 10.0; WOW64; rv:49.0)Gecko/20100101 Firefox/49.0'appIP<-调直(getPDF)#替换cookieappIP [[1]] $ cookies $ JSESSIONID<-getDATA $ cookies $ valueappReq<-make_req(appIP)响应<-appReq [[1]]()writeBin(response $ content,"test.pdf") 

将卷曲字符串直接从浏览器中拔出,然后 curlconverter 完成所有工作.

I am trying to fix a download automation script that I provide publicly so that anyone can easily download the world values survey with R.

On this web page - http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp - the PDF link "WVS_2000_Questionnaire_Root" easily downloads in firefox and chrome.I cannot figure out how to automate the download with httr or RCurl or any other R package. screenshot below of the chrome internet behavior. That PDF link needs to follow through to the ultimate source of http://www.worldvaluessurvey.org/wvsdc/DC00012/F00001316-WVS_2000_Questionnaire_Root.pdf but if you click their directly, there's a connectivity error. i am unclear if this is related to the request header Upgrade-Insecure-Requests:1 or the response header status code 302

Clicking around the new worldvaluessurvey.org website with chrome's inspect element windows open makes me think there were some hacky coding decisions made here, hence the title semi-broken :/

解决方案

Using the excellent curlconverter to mimic the browser you can directly request the pdf.

First we mimic the browser initial GET request (may not be necessary a simple GET and keeping the cookie may suffice):

library(curlconverter)
library(httr)
browserGET <- "curl 'http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp' -H 'Host: www.worldvaluessurvey.org' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1'"
getDATA <- (straighten(browserGET) %>% make_req)[[1]]()

The JSESSIONID cookie is available at getDATA$cookies$value

getPDF <- "curl 'http://www.worldvaluessurvey.org/wvsdc/DC00012/F00001316-WVS_2000_Questionnaire_Root.pdf' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.5' -H 'Connection: keep-alive' -H 'Cookie: JSESSIONID=59558DE631D107B61F528C952FC6E21F' -H 'Host: www.worldvaluessurvey.org' -H 'Referer: http://www.worldvaluessurvey.org/AJDocumentationSmpl.jsp' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0'"
appIP <- straighten(getPDF)
# replace cookie
appIP[[1]]$cookies$JSESSIONID <- getDATA$cookies$value
appReq <- make_req(appIP)
response <- appReq[[1]]()
writeBin(response$content, "test.pdf")

The curl strings were plucked straight from the browser and curlconverter then does all the work.

这篇关于如何使用R在半断的javascript asp函数后面下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆