气象数据从 XML 到 R 中的 Dataframe [英] Meteorological Data from XML to Dataframe in R

查看:27
本文介绍了气象数据从 XML 到 R 中的 Dataframe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试分析气象数据,从它的原生结构在 XML 中直接导入到 R.但它似乎是一种非常复杂的 XML 格式,不符合每行一个观察"的常用标准.数据提供者按每十分钟注册的间隔对变量进行分组.

这是一段 XML 代码:

<mes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C069_2018_1.xsd"><dia Dia="2018-1-01"><hora Hora="00:00"><流星><Dir.Med._a_1800cm>250.5</Dir.Med._a_1800cm><Humedad._a_170cm>43.94</Humedad._a_170cm><Irradia.._a_273cm>0.0</Irradia.._a_273cm><Precip.._a_144cm>0.0</Precip.._a_144cm><Sig.Dir._a_1800cm>17.82</Sig.Dir._a_1800cm><Sig.Vel._a_1800cm>2.78</Sig.Vel._a_1800cm><Tem.Aire._a_170cm>12.57</Tem.Aire._a_170cm><Vel.Max._a_1800cm>15.48</Vel.Max._a_1800cm><Vel.Med._a_1800cm>8.6</Vel.Med._a_1800cm></流星></hora><hora Hora="00:10"><流星><Dir.Med._a_1800cm>249.3</Dir.Med._a_1800cm><Humedad._a_170cm>44.65</Humedad._a_170cm><Irradia.._a_273cm>0.0</Irradia.._a_273cm><Precip.._a_144cm>0.0</Precip.._a_144cm><Sig.Dir._a_1800cm>20.21</Sig.Dir._a_1800cm><Sig.Vel._a_1800cm>2.32</Sig.Vel._a_1800cm><Tem.Aire._a_170cm>12.55</Tem.Aire._a_170cm><Vel.Max._a_1800cm>14.5</Vel.Max._a_1800cm><Vel.Med._a_1800cm>7.8</Vel.Med._a_1800cm></流星></hora><hora Hora="00:20"><流星><Dir.Med._a_1800cm>250.3</Dir.Med._a_1800cm><Humedad._a_170cm>46.17</Humedad._a_170cm><Irradia.._a_273cm>0.0</Irradia.._a_273cm><Precip.._a_144cm>0.0</Precip.._a_144cm><Sig.Dir._a_1800cm>23.02</Sig.Dir._a_1800cm><Sig.Vel._a_1800cm>2.25</Sig.Vel._a_1800cm><Tem.Aire._a_170cm>12.45</Tem.Aire._a_170cm><Vel.Max._a_1800cm>13.72</Vel.Max._a_1800cm><Vel.Med._a_1800cm>5.55</Vel.Med._a_1800cm></流星></hora>...

这里是 2019 年 1 月数据的完整 XML(>60 mb):

<块引用>

http://opendata.euskadi.eus/contenidos/ds_meteorologicos/met_stations_ds_2018/opendata/2018/C069/C069_2018_1.xml

当我使用函数xmlTreeParse"时出现错误:

<块引用>

错误:XML 内容似乎不是 XML"

这是我第一次尝试使用 XML 数据结构,但我一直在尝试在此站点上提出类似问题的建议:

将数据从 xml 转换为 R 数据帧

xml 到 r 中的数据帧

R XML 到数据框

但那些似乎是简单的 XML 结构,可以直接很好地解析,甚至可以使用库XML"和方法"直接转换为数据帧

我需要获得一个与此结构类似的数据框:

dia hora Dir.Med._a_1800cm Humedad._a_170cm Irradia.._a_273cm Precip.._a_144cm Sig.Dir._a_1800cm Sig.Vel._a_1800cm Tem.Aire._a_170cm Velradia.._a_273cm01/01/2018 0:00 250.5 43.94 0.0 0.0 17.82 2.78 12.57 15.48 8.601/01/2018 0:10 249.3 44.65 0.0 0.0 20.21 2.32 12.55 14.5 7.801/01/2018 0:20 250.3 46.17 0.0 0.0 23.02 2.25 12.45 13.72 5.55

解决方案

这是相当多的工作,但并非不可能.此解决方案也适用于每天不同数量的观察.

<预><代码>dia <- xmlstr %>% read_xml() %>% xml_find_all("//dia")dia.dat <- dia %>% map(xml_attrs) %>% map(~t(.) %>% as_tibble)hora <- dia %>% 地图(xml_children)hora.dat <- hora %>% map(xml_attrs) %>% map(~map_df(., ~t(.) %>% as_tibble))hora.details <- hora%>%地图(〜地图(.,xml_children)%>%地图(xml_children)%>%map(~setNames(xml_text(.), xml_name(.)) %>% t() %>% as_tibble)) %>% map(.,~do.call(rbind,.) %>%as_tibble)pmap_df(列表(dia.dat,hora.dat,hora.details),cbind)

我在您的 xml 示例中添加了一些数据以进行测试.(多一天,第二天多一小时).

结果:

<预><代码>Dia Hora Dir.Med._a_1800cm Humedad._a_170cm Irradia.._a_273cm Precip.._a_144cm Sig.Dir._a_1800cm1 2018-1-01 00:00 250.5 43.94 0.0 0.0 17.822 2018-1-01 00:10 249.3 44.65 0.0 0.0 20.213 2018-1-01 00:20 250.3 46.17 0.0 0.0 23.024 2018-1-02 00:00 250.5 43.94 0.0 0.0 17.825 2018-1-02 00:10 249.3 44.65 0.0 0.0 20.216 2018-1-02 00:20 250.3 46.17 0.0 0.0 23.027 2018-1-02 00:30 250.3 46.17 0.0 0.0 23.02Sig.Vel._a_1800cm Tem.Aire._a_170cm Vel.Max._a_1800cm Vel.Med._a_1800cm1 2.78 12.57 15.48 8.62 2.32 12.55 14.5 7.83 2.25 12.45 13.72 5.554 2.78 12.57 15.48 8.65 2.32 12.55 14.5 7.86 2.25 12.45 13.72 5.557 2.25 12.45 13.72 5.55

感谢:

将 XML 节点转换为数据帧

获取XML 文件的所有子节点到 data.frame 或 data.table

I´m trying to analize meteorological data, importing directly to R from it´s native structure in XML. But it seems to be a very complicated XML format not corresponding to the commonly used standard of "one observation per row". The provider of the data has grouped the variables by every ten minutes intervals registered.

Here is a piece of the XML code:

<?xml version= "1.0" encoding="ISO-8859-1" ?>
<mes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C069_2018_1.xsd">
    <dia Dia="2018-1-01">
        <hora Hora="00:00">
            <Meteoros>
                <Dir.Med._a_1800cm>250.5</Dir.Med._a_1800cm>
                <Humedad._a_170cm>43.94</Humedad._a_170cm>
                <Irradia.._a_273cm>0.0</Irradia.._a_273cm>
                <Precip.._a_144cm>0.0</Precip.._a_144cm>
                <Sig.Dir._a_1800cm>17.82</Sig.Dir._a_1800cm>
                <Sig.Vel._a_1800cm>2.78</Sig.Vel._a_1800cm>
                <Tem.Aire._a_170cm>12.57</Tem.Aire._a_170cm>
                <Vel.Max._a_1800cm>15.48</Vel.Max._a_1800cm>
                <Vel.Med._a_1800cm>8.6</Vel.Med._a_1800cm>
            </Meteoros>
        </hora>
        <hora Hora="00:10">
            <Meteoros>
                <Dir.Med._a_1800cm>249.3</Dir.Med._a_1800cm>
                <Humedad._a_170cm>44.65</Humedad._a_170cm>
                <Irradia.._a_273cm>0.0</Irradia.._a_273cm>
                <Precip.._a_144cm>0.0</Precip.._a_144cm>
                <Sig.Dir._a_1800cm>20.21</Sig.Dir._a_1800cm>
                <Sig.Vel._a_1800cm>2.32</Sig.Vel._a_1800cm>
                <Tem.Aire._a_170cm>12.55</Tem.Aire._a_170cm>
                <Vel.Max._a_1800cm>14.5</Vel.Max._a_1800cm>
                <Vel.Med._a_1800cm>7.8</Vel.Med._a_1800cm>
            </Meteoros>
        </hora>
        <hora Hora="00:20">
            <Meteoros>
                <Dir.Med._a_1800cm>250.3</Dir.Med._a_1800cm>
                <Humedad._a_170cm>46.17</Humedad._a_170cm>
                <Irradia.._a_273cm>0.0</Irradia.._a_273cm>
                <Precip.._a_144cm>0.0</Precip.._a_144cm>
                <Sig.Dir._a_1800cm>23.02</Sig.Dir._a_1800cm>
                <Sig.Vel._a_1800cm>2.25</Sig.Vel._a_1800cm>
                <Tem.Aire._a_170cm>12.45</Tem.Aire._a_170cm>
                <Vel.Max._a_1800cm>13.72</Vel.Max._a_1800cm>
                <Vel.Med._a_1800cm>5.55</Vel.Med._a_1800cm>
            </Meteoros>
        </hora>
...

And here is the full XML for the data of january 2019 (>60 mb):

http://opendata.euskadi.eus/contenidos/ds_meteorologicos/met_stations_ds_2018/opendata/2018/C069/C069_2018_1.xml

When I used the function "xmlTreeParse" I got the error:

"Error: XML content does not seem to be XML"

It´s my first attempt with XML data structure, but I´ve been trying the recomendations of similar questions on this site as:

Transforming data from xml into R dataframe

xml to dataframe in r

R XML to Dataframe

But those seem to be simple XML structures that works just fine parsing directly or even converting directly to dataframes with the libraries "XML" and "methods"

I need to obtain a dataframe with similar structure to this:

dia hora    Dir.Med._a_1800cm   Humedad._a_170cm    Irradia.._a_273cm   Precip.._a_144cm    Sig.Dir._a_1800cm   Sig.Vel._a_1800cm   Tem.Aire._a_170cm   Vel.Max._a_1800cm   Vel.Med._a_1800cm
01/01/2018  0:00    250.5   43.94   0.0 0.0 17.82   2.78    12.57   15.48   8.6
01/01/2018  0:10    249.3   44.65   0.0 0.0 20.21   2.32    12.55   14.5    7.8
01/01/2018  0:20    250.3   46.17   0.0 0.0 23.02   2.25    12.45   13.72   5.55

解决方案

It's quite some work but not impossible. This solution will also work with different number of observations per day.


dia <- xmlstr %>% read_xml() %>%  xml_find_all("//dia")
dia.dat <- dia %>% map(xml_attrs) %>% map(~t(.) %>% as_tibble)

hora <- dia %>% map(xml_children) 
hora.dat <- hora %>% map(xml_attrs) %>% map(~map_df(., ~t(.) %>% as_tibble))

hora.details <- hora %>% 
  map(~map(.,xml_children) %>% 
        map(xml_children) %>% 
        map(~setNames(xml_text(.), xml_name(.)) %>% t() %>% as_tibble)) %>% map(.,~do.call(rbind,.) %>% as_tibble)

pmap_df(list(dia.dat, hora.dat, hora.details),cbind)

I added some data to your xml example to test. (1 extra day and 2nd day an extra hour).

Result:


        Dia  Hora Dir.Med._a_1800cm Humedad._a_170cm Irradia.._a_273cm Precip.._a_144cm Sig.Dir._a_1800cm
1 2018-1-01 00:00             250.5            43.94               0.0              0.0             17.82
2 2018-1-01 00:10             249.3            44.65               0.0              0.0             20.21
3 2018-1-01 00:20             250.3            46.17               0.0              0.0             23.02
4 2018-1-02 00:00             250.5            43.94               0.0              0.0             17.82
5 2018-1-02 00:10             249.3            44.65               0.0              0.0             20.21
6 2018-1-02 00:20             250.3            46.17               0.0              0.0             23.02
7 2018-1-02 00:30             250.3            46.17               0.0              0.0             23.02
  Sig.Vel._a_1800cm Tem.Aire._a_170cm Vel.Max._a_1800cm Vel.Med._a_1800cm
1              2.78             12.57             15.48               8.6
2              2.32             12.55              14.5               7.8
3              2.25             12.45             13.72              5.55
4              2.78             12.57             15.48               8.6
5              2.32             12.55              14.5               7.8
6              2.25             12.45             13.72              5.55
7              2.25             12.45             13.72              5.55

credits to answers of:

converting XML nodes to a dataframe

Getting all the children nodes of XML file to data.frame or data.table

这篇关于气象数据从 XML 到 R 中的 Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆