合并具有相同ID变量的行 [英] Merging rows with the same ID variable

查看:154
本文介绍了合并具有相同ID变量的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数据框,2186个obs为38个var。行有一个ID变量引用唯一的实验,并使用

  length(unique(df $ ID))== nrow(df) 

n_occur< -data.frame(table(df $ ID))

我知道我的行中有327行重复了一些ID,重复了多次。我试图合并具有相同ID的行,这些行不是重复的,只是在给定实验中的第二,第三等观察。



所以例如如果我有

  xy ID 
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a

我想要结束

  xy ID x2 y2 ID2 x3 y3 ID3 
1 2 a 1 4 a 2 3 a
1 3 b 3 2 b na na na
2 4 c na na na na na na
1 3 d na na na na na na

我看过类似的SQL和php的问题,但这并没有帮助我在R的尝试。任何帮助将非常感激。

解决方案

您可以使用增强的 dcast 函数从 data.table 包,可以选择多个值变量。使用 setDT(mydf)将数据帧转换为数据表,并使用 [,idx:= 1:.N,by = ID] 您可以在 dcast 公式中使用 ID 添加索引:

  library(data.table)
dcast(setDT(mydf)[,idx:= 1:.N,by = ID] ID〜idx,value.var = c(x,y))

或与开发版的 data.table(v1.9.7 +) ,您可以使用新的 rowid 函数:

  dcast (setDT(mydf),ID〜rowid(ID),value.var = c(x,y))

给出:

  ID x_1 x_2 x_3 y_1 y_2 y_3 
1:a 1 1 2 2 4 3
2:b 1 3 NA 3 2 NA
3:c 2 NA NA 4 NA NA
4:d 1 NA NA 3 NA NA






使用的数据:

  mydf& lt;  -  structure(list(x = c(1L,1L,2L,1L,1L,3L,2L),y = c(2L,3L,
4L,3L,4L,2L,3L) =结构(c(1L,2L,3L,4L,1L,2L,
1L),.Label = c(a,b,c,d),class =factor )),.Names = c(x,
y,ID),class =data.frame,row.names = c(NA,-7L))


I have a dataframe in R with 2186 obs of 38 vars. Rows have an ID variable referring to unique experiments and using

length(unique(df$ID))==nrow(df)

n_occur<-data.frame(table(df$ID))

I know 327 of my rows have repeated IDs with some IDs repeated more than once. I am trying to merge rows with the same ID as these aren't duplicates but just second, third etc. observations within a given experiment.

So for example if I had

x y ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a

I would like to end up with

x y ID x2 y2 ID2 x3 y3 ID3
1 2 a  1  4  a   2  3  a
1 3 b  3  2  b  na na na
2 4 c  na na na na na na
1 3 d  na na na na na na

I've seen similar questions for SQL and php but this hasn't helped me with my attempts in R. Any help would be gratefully appreciated.

解决方案

You could use the enhanced dcast function from the data.table package for that where you can select multiple value variables. With setDT(mydf) you convert your dataframe to a datatable and with [, idx := 1:.N, by = ID] you add a index by ID which you use subsequently in the dcast formula:

library(data.table)
dcast(setDT(mydf)[, idx := 1:.N, by = ID], ID ~ idx, value.var = c("x","y"))

Or with the development version of data.table (v1.9.7+), you can use the new rowid function:

dcast(setDT(mydf), ID ~ rowid(ID), value.var = c("x","y"))

gives:

   ID x_1 x_2 x_3 y_1 y_2 y_3
1:  a   1   1   2   2   4   3
2:  b   1   3  NA   3   2  NA
3:  c   2  NA  NA   4  NA  NA
4:  d   1  NA  NA   3  NA  NA


Used data:

mydf <- structure(list(x = c(1L, 1L, 2L, 1L, 1L, 3L, 2L), y = c(2L, 3L, 
4L, 3L, 4L, 2L, 3L), ID = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
1L), .Label = c("a", "b", "c", "d"), class = "factor")), .Names = c("x", 
"y", "ID"), class = "data.frame", row.names = c(NA, -7L))

这篇关于合并具有相同ID变量的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆