在R中建立家庭嵌套树的父/子关系 [英] Built Family nested tree parent / children relationship in R

查看:135
本文介绍了在R中建立家庭嵌套树的父/子关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究家谱:

我已根据sqldf改编了Bob Horton的示例 https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/

I have adapted Bob Horton's example based on sqldf https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/

我的数据:

      person            father
      Guillou Arthur    NA          
      Cleach Marc       NA          
      Guillou Eric      Guillou Arthur          
      Guillou Jacques   Guillou Arthur          
      Cleach Franck     Cleach Marc         
      Cleach Leo        Cleach Marc         
      Cleach Herbet     Cleach Leo          
      Cleach Adele      Cleach Herbet           
      Guillou Jean      Guillou Eric            
      Guillou Alan      Guillou Eric

我的结果是,后代按"Guillou Arthur"(没有父亲的头等人物)的等级排序:

My results, descendants ordered by levels of "Guillou Arthur" (top person without father) :

  name    parent_name              level
  Guillou Arthur    NA                  1       
  Guillou Eric      Guillou Arthur      2       
  Guillou Jacques   Guillou Arthur      2       
  Guillou Alan      Guillou Eric        3       
  Guillou Jean     Guillou Eric         3       

您可以使用sqldf的递归查询构建此表:

You can built this table with recursive query with sqldf :

数据:

 person <- c("Guillou Arthur",
              "Cleach Marc",
              "Guillou Eric",
              "Guillou Jacques", 
              "Cleach Franck",
              "Cleach Leo",
              "Cleach Herbet",
              "Cleach Adele",
              "Guillou Jean",
              "Guillou Alan" )
 father <- c(NA, NA, "Guillou Arthur" , "Guillou Arthur", "Cleach Marc", "Cleach Marc", "Cleach Leo", "Cleach Herbet", "Guillou Eric", "Guillou Eric")


family <- data.frame(person, father)

大尺寸到长格式的转换:

Large to long format conversion :

    library(tidyr)

    long_family <- gather(family, parent, parent_name, -person)

    long_family

递归查询以查找"Guillou Arthur"(没有父亲的头号人物)的后代:

Recursive query to find descendants of "Guillou Arthur" (top person without father) :

    library(sqldf)
      descendants_sql <- "
      WITH RECURSIVE descendants (name, parent_name, level) AS (
        SELECT person, parent_name, 1 FROM long_family 
          WHERE person = '%s'
          AND parent = '%s'

          UNION ALL
          SELECT F.person, F.parent_name, D.level + 1 
              FROM descendants D
              JOIN long_family F
              ON F.parent_name = D.name)

      SELECT * FROM descendants ORDER BY level, name
      "
      fam <- sqldf(sprintf(descendants_sql, 'Guillou Arthur', 'father'))
      fam   

我的问题:
如何直接使用R(而不是sql)创建包括所有族谱的data.frame对象. 每棵树都以"Cleach Marc"之类的族长(没有父亲)开始. (使用R方法或sqldf方法)

My question :
How can I create a data.frame object including all families trees directly with R (and not sql). Each tree starts with a patriarch (without father) like "Cleach Marc". (with R method or sqldf method)

推荐答案

我们构建了一个递归函数来获取父行,从那里开始,一切都很容易.

We build a recursive function to get the father line, from there everything is easy.

首先,我们使用stringsAsFactors = FALSE定义数据,以使重新格式化更加顺畅.

First we define the data with stringsAsFactors = FALSE for smoother reformatting.

family <- data.frame(person, father,stringsAsFactors = FALSE)

功能

father_line <- function(x){
dad <- subset(family,person==x)$father
if(is.na(dad)) return(x)
c(x,father_line(dad))
}

father_line ("Guillou Alan")
# [1] "Guillou Alan"   "Guillou Eric"   "Guillou Arthur"

使用它来获取级别和其他内容

family$father_line <- lapply(family$person,father_line)
family$level       <- lengths(family$father_line)
family$patriarch   <- sapply(family$father_line,tail,1)

#             person         father                                          father_line level      patriarch
# 1   Guillou Arthur           <NA>                                       Guillou Arthur     1 Guillou Arthur
# 2      Cleach Marc           <NA>                                          Cleach Marc     1    Cleach Marc
# 3     Guillou Eric Guillou Arthur                         Guillou Eric, Guillou Arthur     2 Guillou Arthur
# 4  Guillou Jacques Guillou Arthur                      Guillou Jacques, Guillou Arthur     2 Guillou Arthur
# 5    Cleach Franck    Cleach Marc                           Cleach Franck, Cleach Marc     2    Cleach Marc
# 6       Cleach Leo    Cleach Marc                              Cleach Leo, Cleach Marc     2    Cleach Marc
# 7    Cleach Herbet     Cleach Leo               Cleach Herbet, Cleach Leo, Cleach Marc     3    Cleach Marc
# 8     Cleach Adele  Cleach Herbet Cleach Adele, Cleach Herbet, Cleach Leo, Cleach Marc     4    Cleach Marc
# 9     Guillou Jean   Guillou Eric           Guillou Jean, Guillou Eric, Guillou Arthur     3 Guillou Arthur
# 10    Guillou Alan   Guillou Eric           Guillou Alan, Guillou Eric, Guillou Arthur     3 Guillou Arthur

例如,获得规定的预期输出:

For example to get stated expected output:

subset(family,patriarch == "Guillou Arthur",select=c(person,father,level))
#             person         father level
# 1   Guillou Arthur           <NA>     1
# 3     Guillou Eric Guillou Arthur     2
# 4  Guillou Jacques Guillou Arthur     2
# 9     Guillou Jean   Guillou Eric     3
# 10    Guillou Alan   Guillou Eric     3 

tidyverse的样子如下:

library(tidyverse)
family %>%
  mutate(family_line = map(person,father_line),
         level = lengths(family_line),
         patriarch = map(family_line,last)) %>%
  filter(patriarch == "Guillou Arthur") %>%
  select(person,father,level)

#            person         father level
# 1  Guillou Arthur           <NA>     1
# 2    Guillou Eric Guillou Arthur     2
# 3 Guillou Jacques Guillou Arthur     2
# 4    Guillou Jean   Guillou Eric     3
# 5    Guillou Alan   Guillou Eric     3

这篇关于在R中建立家庭嵌套树的父/子关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆