如何在R中正确解析和扩展我的XML ID [英] How can I parse and expand correctly my XML ID in R

查看:96
本文介绍了如何在R中正确解析和扩展我的XML ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析ID标识的树中的孩子.这是我的XML的样子:

I am trying to parse the child in a tree identify by an ID. This is how my XML looks like:

       <Reporte xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <nombre>PML</nombre>
  <proceso>MDA</proceso>
  <sistema>BCS</sistema>
  <area>PÚBLICA</area>
  <Resultados>
    <Nodo>
      <clv_nodo>07CAB-115</clv_nodo>
      <Valores>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>1</hora>
          <pml>1688.02</pml>
          <pml_ene>1638.38</pml_ene>
          <pml_per>49.64</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>2</hora>
          <pml>1446.18</pml>
          <pml_ene>1405.81</pml_ene>
          <pml_per>40.36</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>3</hora>
          <pml>1389.31</pml>
          <pml_ene>1351.85</pml_ene>
          <pml_per>37.46</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>4</hora>
          <pml>1337.1</pml>
          <pml_ene>1301.93</pml_ene>
          <pml_per>35.17</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>5</hora>
          <pml>1532.75</pml>
          <pml_ene>1492.39</pml_ene>
          <pml_per>40.36</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>6</hora>
          <pml>1729.85</pml>
          <pml_ene>1683.15</pml_ene>
          <pml_per>46.71</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>7</hora>
          <pml>1698.2</pml>
          <pml_ene>1650.29</pml_ene>
          <pml_per>47.92</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>8</hora>
          <pml>1700.84</pml>
          <pml_ene>1649.62</pml_ene>
          <pml_per>51.23</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>9</hora>
          <pml>1708.53</pml>
          <pml_ene>1652.13</pml_ene>
          <pml_per>56.4</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>10</hora>
          <pml>1798.48</pml>
          <pml_ene>1735.19</pml_ene>
          <pml_per>63.29</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>11</hora>
          <pml>1656.64</pml>
          <pml_ene>1595.8</pml_ene>
          <pml_per>60.84</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>12</hora>
          <pml>1712.9</pml>
          <pml_ene>1648.41</pml_ene>
          <pml_per>64.48</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>13</hora>
          <pml>1787.72</pml>
          <pml_ene>1719.13</pml_ene>
          <pml_per>68.59</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>14</hora>
          <pml>1851.01</pml>
          <pml_ene>1779.59</pml_ene>
          <pml_per>71.43</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>15</hora>
          <pml>1950.51</pml>
          <pml_ene>1873.83</pml_ene>
          <pml_per>76.67</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>16</hora>
          <pml>1661.94</pml>
          <pml_ene>1595.87</pml_ene>
          <pml_per>66.07</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>17</hora>
          <pml>1740.8</pml>
          <pml_ene>1671.24</pml_ene>
          <pml_per>69.56</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>18</hora>
          <pml>1895.51</pml>
          <pml_ene>1820.19</pml_ene>
          <pml_per>75.32</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>19</hora>
          <pml>2074.18</pml>
          <pml_ene>1990.16</pml_ene>
          <pml_per>84.02</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>20</hora>
          <pml>1959.91</pml>
          <pml_ene>1878.22</pml_ene>
          <pml_per>81.7</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>21</hora>
          <pml>1791.66</pml>
          <pml_ene>1719.24</pml_ene>
          <pml_per>72.43</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>22</hora>
          <pml>1986.59</pml>
          <pml_ene>1909.79</pml_ene>
          <pml_per>76.8</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>23</hora>
          <pml>1709.51</pml>
          <pml_ene>1648.21</pml_ene>
          <pml_per>61.29</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>24</hora>
          <pml>1539.04</pml>
          <pml_ene>1488.58</pml_ene>
          <pml_per>50.47</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
      </Valores>
    </Nodo>
    <Nodo>
      <clv_nodo>07BLE-115</clv_nodo>
      <Valores>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>1</hora>
          <pml>1646.19</pml>
          <pml_ene>1638.38</pml_ene>
          <pml_per>7.81</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>2</hora>
          <pml>1413.36</pml>
          <pml_ene>1405.81</pml_ene>
          <pml_per>7.55</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>3</hora>
          <pml>1358.96</pml>
          <pml_ene>1351.85</pml_ene>
          <pml_per>7.11</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>4</hora>
          <pml>1308.71</pml>
          <pml_ene>1301.93</pml_ene>
          <pml_per>6.77</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>5</hora>
          <pml>1499.65</pml>
          <pml_ene>1492.39</pml_ene>
          <pml_per>7.27</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>6</hora>
          <pml>1690.93</pml>
          <pml_ene>1683.15</pml_ene>
          <pml_per>7.78</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>7</hora>
          <pml>1659.78</pml>
          <pml_ene>1650.29</pml_ene>
          <pml_per>9.49</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>8</hora>
          <pml>1661.41</pml>
          <pml_ene>1649.62</pml_ene>
          <pml_per>11.79</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>9</hora>
          <pml>1662.14</pml>
          <pml_ene>1652.13</pml_ene>
          <pml_per>10</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>10</hora>
          <pml>1744.71</pml>
          <pml_ene>1735.19</pml_ene>
          <pml_per>9.52</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>11</hora>
          <pml>1604.34</pml>
          <pml_ene>1595.8</pml_ene>
          <pml_per>8.54</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>12</hora>
          <pml>1657.17</pml>
          <pml_ene>1648.41</pml_ene>
          <pml_per>8.76</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>13</hora>
          <pml>1728.18</pml>
          <pml_ene>1719.13</pml_ene>
          <pml_per>9.05</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>14</hora>
          <pml>1789.15</pml>
          <pml_ene>1779.59</pml_ene>
          <pml_per>9.56</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>15</hora>
          <pml>1883.75</pml>
          <pml_ene>1873.83</pml_ene>
          <pml_per>9.92</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>16</hora>
          <pml>1599.54</pml>
          <pml_ene>1595.87</pml_ene>
          <pml_per>3.67</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>17</hora>
          <pml>1674.73</pml>
          <pml_ene>1671.24</pml_ene>
          <pml_per>3.49</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>18</hora>
          <pml>1823.54</pml>
          <pml_ene>1820.19</pml_ene>
          <pml_per>3.35</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>19</hora>
          <pml>1992.57</pml>
          <pml_ene>1990.16</pml_ene>
          <pml_per>2.41</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>20</hora>
          <pml>1885.48</pml>
          <pml_ene>1878.22</pml_ene>
          <pml_per>7.26</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>21</hora>
          <pml>1726.7</pml>
          <pml_ene>1719.24</pml_ene>
          <pml_per>7.46</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>22</hora>
          <pml>1914.07</pml>
          <pml_ene>1909.79</pml_ene>
          <pml_per>4.28</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>23</hora>
          <pml>1653.54</pml>
          <pml_ene>1648.21</pml_ene>
          <pml_per>5.32</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>24</hora>
          <pml>1495.33</pml>
          <pml_ene>1488.58</pml_ene>
          <pml_per>6.75</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
      </Valores>
    </Nodo>
  </Resultados>
  <status>OK</status>
</Reporte>

我正在尝试通过节点索引解析"Nodo" 数据并将其扩展为其子"Valor" 元素的数量,然后使用进行列绑定勇气" 数据.
我从字面上使用这篇文章的描述: [ R XML-组合父级和子级节点放入数据框

I am trying to parse "Nodo" data by node index and expand it to the number of its child "Valor" elements, then column bind with "Valor" data.
I am literally using the description from this post: [R XML - combining parent and child nodes into data frame

到目前为止,我的代码如下:

So far my code looks like this:

library(XML)
library(plyr)
library(purrr)
api <- function(path) {
  url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
  GET(url1)
}
resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115/2017/03/15/2017/03/15/XML")
url1 <- xmlParse(resp)
mtg_num <- length(xpathSApply(url1, "//Nodo"))
#I am using same names as example
#meeting_list is a 0X0 list
meeting_list <- lapply(seq(mtg_num), function(i) {
  races_num <- length(xpathSApply(url1, sprintf("//Resultados[%s]/Valores", i)))

  data.frame(
    meeting_id = rep(xpathSApply(url1, sprintf("//clv_nodo", i)), races_num)
  )
})
final_df <- cbind(do.call(rbind, meeting_list),
                  xmlToDataFrame(nodes = getNodeSet(url1, "//Valores/Valor")),
                  XML:::xmlAttrsToDataFrame(getNodeSet(url1, "//Valores/Valor")))

因为当我尝试将Meeting_list绑定到"Valor"时它是0X0,所以它给了我一个错误:

Since meeting_list is 0X0 when I try to bind it to "Valor" it gives me an error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 48
In addition: Warning message:
In combineNamedVectors(lapply(doc, xmlAttrs), attrs, omit, ...) :
  no elements to combine across records

如何正确解析"clv_nodo"并将其扩展为"Valores"中的48个值.

How can I possible parse correctly "clv_nodo" and expand it to the 48 values in "Valores".

我的愿望数据框如下:

   clv_nodo       fecha hora     pml pml_ene pml_per pml_cng
1  07CAB-115 2017-03-15    1 1688.02 1638.38   49.64       0
2  07CAB-115 2017-03-15    2 1446.18 1405.81   40.36       0
3  07CAB-115 2017-03-15    3 1389.31 1351.85   37.46       0
4  07CAB-115 2017-03-15    4  1337.1 1301.93   35.17       0
5  07CAB-115 2017-03-15    5 1532.75 1492.39   40.36       0
6  07CAB-115 2017-03-15    6 1729.85 1683.15   46.71       0
7  07CAB-115 2017-03-15    7  1698.2 1650.29   47.92       0
8  07CAB-115 2017-03-15    8 1700.84 1649.62   51.23       0
9  07CAB-115 2017-03-15    9 1708.53 1652.13    56.4       0
10 07CAB-115 2017-03-15   10 1798.48 1735.19   63.29       0
11 07CAB-115 2017-03-15   11 1656.64  1595.8   60.84       0
12 07CAB-115 2017-03-15   12  1712.9 1648.41   64.48       0
13 07CAB-115 2017-03-15   13 1787.72 1719.13   68.59       0
14 07CAB-115 2017-03-15   14 1851.01 1779.59   71.43       0
15 07CAB-115 2017-03-15   15 1950.51 1873.83   76.67       0
16 07CAB-115 2017-03-15   16 1661.94 1595.87   66.07       0
17 07CAB-115 2017-03-15   17  1740.8 1671.24   69.56       0
18 07CAB-115 2017-03-15   18 1895.51 1820.19   75.32       0
19 07CAB-115 2017-03-15   19 2074.18 1990.16   84.02       0
20 07CAB-115 2017-03-15   20 1959.91 1878.22    81.7       0
21 07CAB-115 2017-03-15   21 1791.66 1719.24   72.43       0
22 07CAB-115 2017-03-15   22 1986.59 1909.79    76.8       0
23 07CAB-115 2017-03-15   23 1709.51 1648.21   61.29       0
24 07CAB-115 2017-03-15   24 1539.04 1488.58   50.47       0
25 07BLE-115 2017-03-15    1 1646.19 1638.38    7.81       0
26 07BLE-115 2017-03-15    2 1413.36 1405.81    7.55       0
27 07BLE-115 2017-03-15    3 1358.96 1351.85    7.11       0
28 07BLE-115 2017-03-15    4 1308.71 1301.93    6.77       0
29 07BLE-115 2017-03-15    5 1499.65 1492.39    7.27       0
30 07BLE-115 2017-03-15    6 1690.93 1683.15    7.78       0
31 07BLE-115 2017-03-15    7 1659.78 1650.29    9.49       0
32 07BLE-115 2017-03-15    8 1661.41 1649.62   11.79       0
33 07BLE-115 2017-03-15    9 1662.14 1652.13      10       0
34 07BLE-115 2017-03-15   10 1744.71 1735.19    9.52       0
35 07BLE-115 2017-03-15   11 1604.34  1595.8    8.54       0
36 07BLE-115 2017-03-15   12 1657.17 1648.41    8.76       0
37 07BLE-115 2017-03-15   13 1728.18 1719.13    9.05       0
38 07BLE-115 2017-03-15   14 1789.15 1779.59    9.56       0
39 07BLE-115 2017-03-15   15 1883.75 1873.83    9.92       0
40 07BLE-115 2017-03-15   16 1599.54 1595.87    3.67       0
41 07BLE-115 2017-03-15   17 1674.73 1671.24    3.49       0
42 07BLE-115 2017-03-15   18 1823.54 1820.19    3.35       0
43 07BLE-115 2017-03-15   19 1992.57 1990.16    2.41       0
44 07BLE-115 2017-03-15   20 1885.48 1878.22    7.26       0
45 07BLE-115 2017-03-15   21  1726.7 1719.24    7.46       0
46 07BLE-115 2017-03-15   22 1914.07 1909.79    4.28       0
47 07BLE-115 2017-03-15   23 1653.54 1648.21    5.32       0
48 07BLE-115 2017-03-15   24 1495.33 1488.58    6.75       0

更新最终答案

Dave2e的代码对于本示例中要解析的两个IDS(07CAB-115、07BLE-115)非常有效,但我需要解析2500个不同的IDS.其中一些在"Valores"节点上具有空节点.例如这个例子:

Dave2e's code worked perfectly for the two IDS that I am parsing in this example (07CAB-115, 07BLE-115) but I need to parse 2500 different IDS. Some of them have empty nodes at a "Valores" node. For example for this one:

<Nodo>
      <clv_nodo>07ASJ-115</clv_nodo>
      <Valores> 

所以当我运行Dave2e代码时,我得到了:

so when I ran Dave2e code I got:

 Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 1, 0

这是因为在最后一部分中,我得到了一个嵌套列表,其中最后一个是0x0 df<- bind_rows(valornodes),所以我将ID与该空列表绑定在一起.正如Dave2e所建议的那样,解决方案是用Valores中的空节点过滤clvs.这样,嵌套列表和ID列表之间的匹配是正确的.这是最终代码:

This is because in the last part I get a nested list with the last one being 0x0 df<- bind_rows(valornodes) then I was binding the ID with this empty list. The solution, as Dave2e suggested, was filtering clvs by the empty nodes in Valores. In this way the match between the nested list and the list of ID's is correct. This is the final code:

 api <- function(path) {
      url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
      GET(url1)
    }
    resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115,07CAD-115,07ASJ-115/2017/03/26/2017/04/01/XML")
z <- read_xml(resp)
parents <-xml_find_all(z, ".//Nodo")
dfs<-lapply(parents, function(node){
  #find clvs name
  clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
  #Find all children
  valors <- node %>% xml_find_all(".//Valor")
#Find all children in Valores  
val <- node %>% xml_find_all(".//Valores")

#Filter clvs by empty nodes in Valores  
clvs <- clvs[xml_length(val)>0]


  #remove cases where the valors nodes have no children nodes
  valors <- valors[xml_length(valors)>0]

  valornodes <- lapply(valors, function(node){
    #get values and names
    values <- xml_children(node) %>% xml_text()
    names <- xml_children(node) %>% xml_name()

    #make data.frame and name the columns
    tempdf<- data.frame(t(values), stringsAsFactors = FALSE)
    names(tempdf) <- names
    tempdf
  })
  #made data frame with all of results
  df<- bind_rows(valornodes)
  df<- cbind(clvs,df)
df

})

q <- do.call(rbind.data.frame, dfs)

推荐答案

以下是使用xml2包的解决方案.
此解决方案假定非常Valor节点具有相同的子节点.如果不是这种情况,则此解决方案将失败.

Here is a solution using the xml2 package.
This solution assumes very Valor node has the same subnodes. If that is not the case then this solution will fail.

library(xml2)
library(dplyr)

#find parent nodes
parents <-xml_find_all(page, ".//Nodo")

#parse each child
dfs<-lapply(parents, function(node){
  #find clvs name
  clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
  #Find all children
  valors <- node %>% xml_find_all(".//Valor")

  #remove cases where the valors nodes have no children nodes
   valors <- valors[xml_length(valors)>0]

  #get values
  values <- xml_children(valors) %>% xml_text()

  #made data frame with results (assumes no missing child nodes)
  df<- data.frame(matrix(values, ncol=6, byrow=TRUE), stringsAsFactors = FALSE)
  #get node names and rename columns
  names(df)<-node %>% xml_find_first(".//Valor")  %>%  xml_children() %>% xml_name()
  df<- cbind(clvs, df)   #bind on clvs name

  df
})

#Make find answer
answer<-bind_rows(dfs)
head(answer)
#        clvs      fecha hora     pml pml_ene pml_per pml_cng
# 1 07CAB-115 2017-03-15    1 1688.02 1638.38   49.64       0
# 2 07CAB-115 2017-03-15    2 1446.18 1405.81   40.36       0
# 3 07CAB-115 2017-03-15    3 1389.31 1351.85   37.46       0
# 4 07CAB-115 2017-03-15    4  1337.1 1301.93   35.17       0
# 5 07CAB-115 2017-03-15    5 1532.75 1492.39   40.36       0
# 6 07CAB-115 2017-03-15    6 1729.85 1683.15   46.71       0

更新
Valor的子节点数更改或不一致.然后用下面的脚本替换上面的lapply.它确实成为一个循环中的一个循环,因此会影响性能.

Update
If the number of sub nodes to Valor changes or is not consistent. Then substitute in the below script in for the lapply above. It does become a loop within a loop so performance will be impacted.

#parse each child
dfs<-lapply(parents, function(node){
  #find clvs name
  clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
  #Find all children
  valors <- node %>% xml_find_all(".//Valor")

  #remove cases where the valors nodes have no children nodes
   valors <- valors[xml_length(valors)>0]

  valornodes <- lapply(valors, function(node){
    #get values and names
    values <- xml_children(node) %>% xml_text()
    names <- xml_children(node) %>% xml_name()

    #make data.frame and name the columns
    tempdf<- data.frame(t(values), stringsAsFactors = FALSE)
    names(tempdf) <- names
    tempdf
  })
  #made data frame with all of results
  df<- bind_rows(valornodes)

  df<- cbind(clvs, df)
  df
})

这篇关于如何在R中正确解析和扩展我的XML ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆