R XML - 将父节点和子节点(同名)合并到数据框中 [英] R XML - combining parent and child nodes(w same name) into data frame

查看:23
本文介绍了R XML - 将父节点和子节点(同名)合并到数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的 XML:

<SoccerFeed timestamp="20181123T153249+0000">
  <SoccerDocument season_name="Season 2016/2017" season_id="2016" competition_name="French Ligue 1" competition_id="24" competition_code="FR_L1" Type="SQUADS Latest">
    <Team web_address="www.angers-sco.fr" uID="t2128" short_club_name="Angers" region_name="Europe" region_id="17" country_iso="FR" country_id="8" country="France">
      <Founded>1919</Founded>
      <Name>Angers</Name>
      <Player uID="p40511">
        <Name>Denis Petric</Name>
        <Position>Goalkeeper</Position>
        <Stat Type="first_name">Denis</Stat>
        <Stat Type="last_name">Petric</Stat>
        <Stat Type="birth_date">1988-05-24</Stat>
        <Stat Type="weight">83</Stat>
        <Stat Type="height">187</Stat>
        <Stat Type="jersey_num">1</Stat>
        <Stat Type="real_position">Goalkeeper</Stat>
        <Stat Type="real_position_side">Unknown</Stat>
        <Stat Type="join_date">2016-01-02</Stat>
        <Stat Type="country">Slovenia</Stat>
      </Player>
      <Player uID="p119744">
        <Name>Mathieu Michel</Name>
        <Position>Goalkeeper</Position>
        <Stat Type="first_name">Mathieu</Stat>
        <Stat Type="last_name">Michel</Stat>
        <Stat Type="birth_date">1991-09-04</Stat>
        <Stat Type="birth_place">Nîmes</Stat>
        <Stat Type="first_nationality">France</Stat>
        <Stat Type="preferred_foot">Right</Stat>
        <Stat Type="weight">84</Stat>
        <Stat Type="height">189</Stat>
        <Stat Type="jersey_num">1</Stat>
        <Stat Type="real_position">Goalkeeper</Stat>
        <Stat Type="real_position_side">Unknown</Stat>
        <Stat Type="join_date">2016-08-18</Stat>
        <Stat Type="country">France</Stat>
      </Player>

到目前为止,我运行了以下代码:

So far I ran the following code:

library(tidyverse)
library(xml2)

x <- read_xml('player.xml')

Players3 <- x %>% 
  xml_find_all('//Player') %>% 
  map_df(~flatten(c(xml_attrs(.x), 
                map(xml_children(.x), 
                    ~set_names(as.list(xml_text(.x)), xml_name(.x)))))) %>%
type_convert()

但是通过 Player_id 我只得到了名字、位置、租借和只有一个数据.

But by Player_id I got only the Name, Position, Loan and ONLY ONE Stat.

我被卡住了,因为对于每个玩家,我多次获得相同的节点名称.我想从这个 XML 文件中获取一个数据框,其中包含 stat 节点的类型.

I am stuck because for each player I got the same node name multiple time. I'd like to obtain a dataframe from this XML file with the Type of the stat node.

类似:

uID |姓名 |职位 |名字|姓氏 |出生日期 |重量 |身高|jersey_num |real_position |real_position_side |加入日期 |国家 |贷款

uID | Name | Position | first_name | last_name | birth_date | weight | height | jersey_num | real_position | real_position_side | join_date | country | loan

如果我能额外获得像 Team uID 和 short_club_name 这样的父节点信息,那就太好了

In bonus if I can have in addition the parent node information like the Team uID and short_club_name it would be great

推荐答案

这是一个可以尝试的解决方案.有关流程步骤的说明,请参阅注释:

Here is a solution to try. See comments for an explanation of the process steps:

library(xml2)
library(dplyr)

x <- read_xml('player.xml')

Players3 <- x %>% xml_find_all('//Player') 

dfs<-lapply(Players3, function(node){
   #find names of all children nodes
   childnodes<-node %>% xml_children() %>% xml_name()
   #find the attr value from all child nodes
   names<-node %>% xml_children() %>% xml_attr("Type")
   #create columns names based on either node name or attr value
   names<-ifelse(is.na(names), childnodes, names)

   #find all values
   values<-node %>% xml_children() %>% xml_text()

   #create data frame and properly label the columns
   df<-data.frame(t(values), stringsAsFactors = FALSE)
   names(df)<-names
   df
})

#bind together and add uid to final dataframe.
answer<-bind_rows(dfs)
answer$UID<- Players3 %>% xml_attr("uID")
answer

#             Name   Position first_name last_name birth_date weight height jersey_num real_position
# 1   Denis Petric Goalkeeper      Denis    Petric 1988-05-24     83    187          1    Goalkeeper
# 2 Mathieu Michel Goalkeeper    Mathieu    Michel 1991-09-04     84    189          1    Goalkeeper
#   real_position_side  join_date  country birth_place first_nationality preferred_foot     UID
# 1            Unknown 2016-01-02 Slovenia        <NA>              <NA>           <NA>  p40511
# 2            Unknown 2016-08-18   France       Nimes            France          Right p119744

这篇关于R XML - 将父节点和子节点(同名)合并到数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆