r中的数据重组 [英] data reorganization in r

查看:132
本文介绍了r中的数据重组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下类型的数据:

Person <- c("A", "B", "C", "AB", "BC", "AC",  "D", "E")
Father <- c(NA,  NA,  NA,   "A", "B", "C",    NA, "D")
Mother <- c(NA,  NA,  NA, "B",   "C", "A", "C",    NA)
var1 <- c(  1,   2,   3,     4,   2,   1,     6, 9)
var2 <- c(1.4, 2.3, 4.3,  3.4, 4.2, 6.1,   2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2)

 Person Father Mother var1 var2
1      A   <NA>   <NA>    1  1.4
2      B   <NA>   <NA>    2  2.3
3      C   <NA>   <NA>    3  4.3
4     AB      A      B    4  3.4
5     BC      B      C    2  4.2
6     AC      C      A    1  6.1
7      D   <NA>      C    6  2.6
8      E      D   <NA>    9  8.2

这是为了丢失(未知)。我想重新组织数据到三人组(个人及其父母母亲)。例如,AB个人的三人组将包括来自其父亲A和母亲B的数据。

Here is for missing (unknown). I want re-organize data in to trio (an Individual and its Father and Mother). For example trio for AB individual will include data from from its father A and mother B.

 Person Father Mother var1 var2
1      A   <NA>   <NA>    1  1.4
2      B   <NA>   <NA>    2  2.3
4     AB      A      B    4  3.4

A,B,C不能做三重奏他们没有父母E表示只有一个父亲父亲是D。在这种情况下,三人中只有两个成员。

A, B, C can not make trio as they do not have parents. Somecases as E has only one parent father known that is D. In this case there will just two members in the trio.

  7      D   <NA>      C    6  2.6
  3      C   <NA>   <NA>    3  4.3

如果母亲和父亲在两个三人组中重复,则相同的值将被回收。

In case where mother and fathers are repeated in two trios the same value will be recycled.

因此,预期的完整输出将是:

Thus expected complete output would be:

    Person Father Mother var1 var2  Trio 
1      A   <NA>   <NA>    1  1.4     1
2      B   <NA>   <NA>    2  2.3     1
4     AB      A      B    4  3.4     1

2      B   <NA>   <NA>    2  2.3     2
3      C   <NA>   <NA>    3  4.3     2
5     BC      B      C    2  4.2     2

1      A   <NA>   <NA>    1  1.4     3
3      C   <NA>   <NA>    3  4.3     3
6     AC      C      A    1  6.1     3

NA       <NA> <NA>    <NA>  NA  NA     4
3      C   <NA>   <NA>    3  4.3      4
7      D   <NA>      C    6  2.6      4

NA       <NA> <NA>    <NA>  NA  NA     5
7      D   <NA>      C      6  2.6     5
8      E      D   <NA>      9  8.2     5     


推荐答案

p>

This maybe roughly what you want

Person <- c("A", "B", "C", "AB", "BC", "AC",  "D", "E")
Father <- c(NA,  NA,  NA,   "A", "B", "C",    NA, "D")
Mother <- c(NA,  NA,  NA, "B",   "C", "A", "C",    NA)
var1 <- c(  1,   2,   3,     4,   2,   1,     6, 9)
var2 <- c(1.4, 2.3, 4.3,  3.4, 4.2, 6.1,   2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2,stringsAsFactors=F)

使用注意myd的定义略有变化stringAsFactors = F

parentage<-function(x,myd){
    y<-myd[x,]
    p1<-as.character(y['Father'])
    p2<-as.character(y['Mother'])
    out<-y
    if(!is.na(p1)){
        out<-rbind(out,myd[myd$Person==p1,])
    }
    if(!is.na(p2)){
        out<-rbind(out,myd[myd$Person==p2,])
    }
    out$Trio=x
    out
}

ans<-lapply(seq_along(myd$Person),parentage,myd)

 > ans
[[1]]
  Person Father Mother var1 var2 Trio
1      A   <NA>   <NA>    1  1.4    1

[[2]]
  Person Father Mother var1 var2 Trio
2      B   <NA>   <NA>    2  2.3    2

[[3]]
  Person Father Mother var1 var2 Trio
3      C   <NA>   <NA>    3  4.3    3

[[4]]
   Person Father Mother var1 var2 Trio
4      AB      A      B    4  3.4    4
2       A   <NA>   <NA>    1  1.4    4
21      B   <NA>   <NA>    2  2.3    4

[[5]]
  Person Father Mother var1 var2 Trio
5     BC      B      C    2  4.2    5
2      B   <NA>   <NA>    2  2.3    5
3      C   <NA>   <NA>    3  4.3    5

[[6]]
   Person Father Mother var1 var2 Trio
6      AC      C      A    1  6.1    6
3       C   <NA>   <NA>    3  4.3    6
31      A   <NA>   <NA>    1  1.4    6

[[7]]
  Person Father Mother var1 var2 Trio
7      D   <NA>      C    6  2.6    7
3      C   <NA>   <NA>    3  4.3    7

[[8]]
  Person Father Mother var1 var2 Trio
8      E      D   <NA>    9  8.2    8
7      D   <NA>      C    6  2.6    8

如果你想拥有一个数据框,你可以使用 plyr

if you want to have a dataframe you can use the plyr package

library(plyr)
ans<-adply(seq_along(myd$Person),1,parentage,myd)

这篇关于r中的数据重组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆