合并具有相同ID变量的行 [英] Merging rows with the same ID variable
问题描述
length(unique(df $ ID))== nrow(df)
n_occur< -data.frame(table(df $ ID))
我知道我的行中有327行重复了一些ID,重复了多次。我试图合并具有相同ID的行,这些行不是重复的,只是在给定实验中的第二,第三等观察。
所以例如如果我有
xy ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a
我想要结束
xy ID x2 y2 ID2 x3 y3 ID3
1 2 a 1 4 a 2 3 a
1 3 b 3 2 b na na na
2 4 c na na na na na na
1 3 d na na na na na na
我看过类似的SQL和php的问题,但这并没有帮助我在R的尝试。任何帮助将非常感激。
您可以使用增强的 dcast
函数从 data.table 包,可以选择多个值变量。使用 setDT(mydf)
将数据帧转换为数据表,并使用 [,idx:= 1:.N,by = ID]
您可以在 dcast
公式中使用 ID
添加索引:
library(data.table)
dcast(setDT(mydf)[,idx:= 1:.N,by = ID] ID〜idx,value.var = c(x,y))
或与开发版的 data.table(v1.9.7 +) ,您可以使用新的 rowid
函数:
dcast (setDT(mydf),ID〜rowid(ID),value.var = c(x,y))
给出:
ID x_1 x_2 x_3 y_1 y_2 y_3
1:a 1 1 2 2 4 3
2:b 1 3 NA 3 2 NA
3:c 2 NA NA 4 NA NA
4:d 1 NA NA 3 NA NA
使用的数据:
mydf& lt; - structure(list(x = c(1L,1L,2L,1L,1L,3L,2L),y = c(2L,3L,
4L,3L,4L,2L,3L) =结构(c(1L,2L,3L,4L,1L,2L,
1L),.Label = c(a,b,c,d),class =factor )),.Names = c(x,
y,ID),class =data.frame,row.names = c(NA,-7L))
I have a dataframe in R with 2186 obs of 38 vars. Rows have an ID variable referring to unique experiments and using
length(unique(df$ID))==nrow(df)
n_occur<-data.frame(table(df$ID))
I know 327 of my rows have repeated IDs with some IDs repeated more than once. I am trying to merge rows with the same ID as these aren't duplicates but just second, third etc. observations within a given experiment.
So for example if I had
x y ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a
I would like to end up with
x y ID x2 y2 ID2 x3 y3 ID3
1 2 a 1 4 a 2 3 a
1 3 b 3 2 b na na na
2 4 c na na na na na na
1 3 d na na na na na na
I've seen similar questions for SQL and php but this hasn't helped me with my attempts in R. Any help would be gratefully appreciated.
You could use the enhanced dcast
function from the data.table package for that where you can select multiple value variables. With setDT(mydf)
you convert your dataframe to a datatable and with [, idx := 1:.N, by = ID]
you add a index by ID
which you use subsequently in the dcast
formula:
library(data.table)
dcast(setDT(mydf)[, idx := 1:.N, by = ID], ID ~ idx, value.var = c("x","y"))
Or with the development version of data.table (v1.9.7+), you can use the new rowid
function:
dcast(setDT(mydf), ID ~ rowid(ID), value.var = c("x","y"))
gives:
ID x_1 x_2 x_3 y_1 y_2 y_3
1: a 1 1 2 2 4 3
2: b 1 3 NA 3 2 NA
3: c 2 NA NA 4 NA NA
4: d 1 NA NA 3 NA NA
Used data:
mydf <- structure(list(x = c(1L, 1L, 2L, 1L, 1L, 3L, 2L), y = c(2L, 3L,
4L, 3L, 4L, 2L, 3L), ID = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
1L), .Label = c("a", "b", "c", "d"), class = "factor")), .Names = c("x",
"y", "ID"), class = "data.frame", row.names = c(NA, -7L))
这篇关于合并具有相同ID变量的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!