R中的天气数据抓取与提取 [英] Weather data scraping and extraction in R

查看:346
本文介绍了R中的天气数据抓取与提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个研究项目,被分配到做一些数据抓取和编写R代码,可以帮助从一个站点,如wunderground.com提取当前温度的特定邮政编码。现在这可能是一个抽象的问题,但没有人知道如何做到以下几点:$ b​​ $ b我可以通过这样做来提取特定邮政编码的当前温度:

 临时工<  -  readLines(http://www.wunderground.com/q/zmw:20904.1.99999)
编辑(临时工)
temps //给我的网站的源代码,我可以看看包含温度的行
ldata< - temps [lnumber]
ldata
#然后有几个gsub基本上从这行代码



中提取
#的数值数据(例如57.8) p>我有一个cvs文件,里面包含了国内每个城市的邮政编码,我也有这个邮政编码。它根据邮政编码,城市和州情况安排在一张桌子上。我现在面临的挑战是编写一个方法(在这里使用java的比喻,因为我是R的新手),它基本上提取6-7个连续的邮政编码(在特定的邮政编码之后),并通过修改readLines函数中的链接来运行上述代码并在链接段 zmw:XXXXX 之后输入相应的邮政编码,然后根据该链接运行所有内容。现在我不太清楚如何从表格中提取数据。也许有一个for循环功能?但是,我不知道如何使用它来修改链接。我认为这是我真正陷入困境的地方。我有一些Java背景,所以我明白如何解决这个问题,而不是语法的知识。我知道这是一个相当抽象的问题,因为我没有提供很多代码,但我只是想知道它们的函数/语法,它们将帮助我从表中提取数据,并以某种方式通过函数来​​修改链接所以这是关于Weather Underground的数据。


$ b $

b

您可以从wunderground的个别气象站下载csv文件,但是您需要知道气象站标识符。以下是华盛顿州柯克兰(KWAKIRKL8)气象站的一个例子:

http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID = KWAKIRKL8& day = 31& month = 1& year = 2014& graphspan = day& format = 1

以下是一些R代码:

 网址<  - 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&月= 1&年= 2014& graphspan =天&格式= 1'
s < - getURL(url)
s < - gsub(< br> \\\
,, )
wdf < - read.csv(con <-textConnection(s))

这里是一个页面,您可以手动查找工作站及其代码。



http://www.wunderground.com/wundermap/



由于您只需要几个,您可以手动将其取出。


I'm working on a research project and am assigned to do a bit of data scraping and writing code in R that can help extract current temperature for a particular zip code from a site such as wunderground.com. Now this may be a bit of an abstract question but does anyone know how to do the following: I can extract the current temperature of a particular zip code by doing this:

    temps <- readLines("http://www.wunderground.com/q/zmw:20904.1.99999")
    edit(temps)
    temps //gives me the source code for the website where I can look at the line that contains the temperature
    ldata <- temps[lnumber]
    ldata
    #  then have a few gsub functions that basically extracts 
    # just the numerical data (57.8 for example) from that line of code

I have a cvs file that contains zip code of every city in the country and I have that imported in R. It is arranged in a table according to zip, city and state. My challenge now is to write a method (using java analogy here because I'm new to R) that basically extracts 6-7 consecutive zip codes (after a particular one specified) and runs the above code by modifying the link within the readLines function and putting in the respective zip code after the link segment zmw:XXXXX and running everything after that based on that link. Now I don't quite know how to extract the data from the table. Maybe with a for-loop function? But then I don't know how to use that to modify the link. I think that's where I'm really getting stuck on. I have a bit of Java background so I understand HOW to approach this problem, just not the knowledge of the syntax. I understand this is quite an abstract question as I didn't provide a lot of code but I just want to know they functions/syntax that will help me extract the data from the table and somehow use that to modify the link through a function rather than manually doing it.

解决方案

So this is about the Weather Underground data.

You can download csv files from individual weather stations in wunderground, however you need to know the weather station identifier. Here is an example URL for a weather station in Kirkland, WA (KWAKIRKL8):

http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&month=1&year=2014&graphspan=day&format=1

Here is some R code:

  url <- 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&month=1&year=2014&graphspan=day&format=1'
  s <- getURL(url)
  s <- gsub("<br>\n","",s)  
  wdf <- read.csv(con<-textConnection(s))

And here is a page with which you can manually find stations and their codes.

http://www.wunderground.com/wundermap/

Since you only need a few you can pick them out manually.

这篇关于R中的天气数据抓取与提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆