R从aspx下载https获取网站而不是CSV [英] R download from aspx in https getting website instead of CSV

查看:223
本文介绍了R从aspx下载https获取网站而不是CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

警告:纽贝在这里。我会感谢一些指导。我正在努力学习如何使用R自动化下载。



我需要什么:
要下载数据在所有县和报告期间,从本网站的页岩气井:
https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCounty.aspx
(请注意,进入时可能会要求协议,而不是很大)



我可以访问列出我要下载的所有CSV文件的页面。不幸的是,该网站具有与上述相同的地址。 (您可以尝试选择一个县和报告期,并自行查看)



但是,在该页面中,列出了激活CSV下载的链接。每个人都是这样的:
https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true& ; INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY



我尝试过的:

 库(下载)

下载(https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/生产/生产ByCountyExport.aspx?UNCONVENTIONAL_ONLY = false& INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY,
destfile =Prod_AUG15_Allegheny.csv)

我跟随了另一个人在这里做的:
从aspx下载文档R中的网页



问题:
此命令保存网站而不是csv文件。 p>

 尝试URL'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY = false& INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY'
内容类型'text / html; charset = utf-8'length 11592 bytes(11 Kb)
打开URL
已下载11 Kb

问题:
是否与我的网页是https而不是http相关?
任何有关如何解决问题的指导或其他相关的帖子?
(我可以在aspx下载找到一些帖子,但没有帮助)



提前感谢

解决方案

@hrbrmstr它工作!不是我想在乞讨的方式,但RSelenium我可以点击按钮接受协议,并实际打开下载链接。



这是代码(很简单,但总是让我一整天找出什么耻辱):

 #使用RSelenium保存文件
##如果需要,安装软件包
install.packages(RSelenium)
##激活
库(RSelenium)
checkForServer()
startServer()
#我必须手动启动服务器!
remDr< - remoteDriver()
remDr
remDr $ open()
#open网站和接受条件
remDr $ navigate(https:// www。 paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx)
AgreeButton< -remDr $ findElement(using ='id',value =MainContent_AgreeButton)
AgreeButton $ highlightElement( )
AgreeButton $ clickElement()

remDr $ navigate(https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false& ; INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY)

然而!我无法保存csv文件:-(我知道我需要一个命令将链接保存为...但是我在另一个与RSelenium有关的话题中提出这个命令。



当我发现时会编辑答案!


warning: Newbe here. I would appreciate some guidance. I am trying to do the investment to learn how to use R for automatizing downloads.

What I need: To download data on shale gas wells from this website for all counties and reporting periods: https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCounty.aspx (Note that agreement might be asked when entering, not a big deal)

I can get to the page that lists all the CSV files I want to download. Unfortunately the site has the same address as above. (You can try to choose a county and a reporting period and see for yourself)

However once in that page, the links that activate the CSV downloads are listed. For each of them is something like this: https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY

What I have tried:

library(downloader)

download ("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY",
          destfile="Prod_AUG15_Allegheny.csv")

I have followed what another person did here: Download documents from aspx web page in R

The problem: This command saves the website instead of the csv file.

trying URL 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY'
Content type 'text/html; charset=utf-8' length 11592 bytes (11 Kb)
opened URL
downloaded 11 Kb

The question: Is it related with my page being a https instead of http? Any guidance on how to solve it or other posts that are relevant? (I could find some posts on aspx downloads but nothing helpful)

Thanks in advance

解决方案

@hrbrmstr It worked! Not the way I wanted at the beggining but with RSelenium I could click the button for accepting the agreement and actually open the download link.

Here is the code (Is simple but took me all day to find out, what a shame):

# Using RSelenium to save file
##Installing the package if needed
install.packages("RSelenium")
##Activating 
library("RSelenium")
checkForServer()
startServer()
#I had to start the server manually!
remDr <- remoteDriver()
remDr
remDr$open()
#open website and accepting conditions
remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx")
AgreeButton<-remDr$findElement(using = 'id', value="MainContent_AgreeButton")
AgreeButton$highlightElement()
AgreeButton$clickElement()

remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY")

However!! I am not able to save the csv file :-(. I know I need a command for "Save link as..." But I am asking this in another topic related to RSelenium.

Will Edit the answer when I find out!

这篇关于R从aspx下载https获取网站而不是CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆