将R中的数据作为数据帧从Web源获取 [英] Getting data in R as dataframe from web source

查看:120
本文介绍了将R中的数据作为数据帧从Web源获取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用RCurl包将一些空气污染背景数据直接加载到R中作为data.frame。

I am trying to load some air pollution background data directly into R as a data.frame using the RCurl package.

有问题的网站有3个下拉框请在下载.csv文件之前选择选项,如下所示:

The website in question has 3 dropdown boxes to choose options before downloading the .csv file as shown in figure below:

我试图从下拉框中选择3个值,并使用下载CSV按钮直接下载到R一个data.frame。

I am trying to select 3 values from the drop down box and download the data using "Download CSV" button directly into R as a data.frame.

我想为特定网站下载多年和多种污染物的不同组合。

I want to download the different combinations of multiple years and multiple pollutants for a specific site.

在StackOverflow的其他帖子我遇到了 getForm 函数从RCurl包,但我不明白以控制此功能的3个下拉框。

In other posts on StackOverflow I have come across getForm function from the RCurl package but I don't understand how to control the 3 dropdown boxes with this function.

数据源的URL是: http://uk-air.defra.gov.uk/data/laqm-background-maps?year=2011

推荐答案

对于本网站,您可以构建一个网址并提交 GET 请求简单地获取csv:

For this website you can construct a url and submit a GET request to simply get the csv:

library(httr)
baseURL <- "http://uk-air.defra.gov.uk/data/laqm-background-maps.php"
queryList <- parse_url(baseURL)
queryList$query <- list("bkgrd-la" = 359, "bkgrd-pollutant" = "no2", "bkgrd-year" = 2011,
                        action = "data", year = 2011, submit = "Download+CSV")
res <- GET(build_url(queryList), write_disk("temp.csv"))

library(XML)
doc <- htmlParse("http://uk-air.defra.gov.uk/data/laqm-background-maps?year=2011")
councils <- doc["//*[@id='bkgrd-la']/option", fun = function(x){
  data.frame(value = xmlGetAttr(x, "value"), council = xmlValue(x))
  }]
councils <- do.call(rbind.data.frame, councils)
> head(councils)
value                      council
1   359        Aberdeen City Council
2   360        Aberdeenshire Council
3     1        Adur District Council
4     2    Allerdale Borough Council
5     4 Amber Valley Borough Council
6   401      Anglesey County Council

pollutants <- doc["//*[@id='bkgrd-pollutant']/option", fun = function(x){
  data.frame(value = xmlGetAttr(x, "value"), council = xmlValue(x))
}]
pollutants <- do.call(rbind.data.frame, pollutants)
> head(pollutants)
value council
1   no2     NO2
2   nox     NOx
3  pm10    PM10
4  pm25   PM2.5
5   no2     NO2
6   nox     NOx

等...

这篇关于将R中的数据作为数据帧从Web源获取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆