R中data.frame中的行重复 [英] Duplication of Rows in data.frame in R
本文介绍了R中data.frame中的行重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个很大的data.frame,看起来类似于下面的示例:
I have a large data.frame, which looks similar to the example below:
ID date sex grade location
1 1 2000 m 1 x
2 1 2001 m 2 y
3 2 1999 f 3 z
4 2 2000 f 4 f
5 3 2000 m 5 k
6 3 2001 m 6 l
要重现它,请运行:
df <- data.frame(ID=c(1,1,2,2,3,3),
date=c(2000,2001,1999,2000,2000,2001),
sex = c("m", "m", "f", "f", "m", "m"),
grade =c(1,2,3,4,5,6),
location =c("x","y","z", "f","k","l") )
我渴望操纵/更改我的data.frame以获得以下结构:
I am eager to manipulate/change my data.frame to get a following structure:
ID date sex grade location
1 1 1999 m 0 0
2 1 2000 m 1 x
3 1 2001 m 2 y
4 2 1999 f 3 z
5 2 2000 f 4 f
6 2 2001 f 0 0
7 3 1999 m 0 0
8 3 2000 m 5 k
9 3 2001 m 6 l
推荐答案
这可以通过 data.table
完成,如下所示:
This can be done with data.table
like so:
library(data.table)
setDT(df, key = c("ID", "date"))
> df[CJ(ID, date, unique = TRUE)]
ID date sex grade location
1: 1 1999 NA NA NA
2: 1 2000 m 1 x
3: 1 2001 m 2 y
4: 2 1999 f 3 z
5: 2 2000 f 4 f
6: 2 2001 NA NA NA
7: 3 1999 NA NA NA
8: 3 2000 m 5 k
9: 3 2001 m 6 l
如果要在 ID
中统一 sex
:
df <- df[CJ(ID, date, unique = TRUE)]
df[ , sex := unique(na.omit(sex)), by = ID]
如果您真的希望 0
而不是 NA
用于等级
和位置
(您应重新考虑,因为最好将其保留为 NA
):
If you really want 0
s instead of NA
for grade
and location
(you should reconsider this, as it's likely preferable to leave it as NA
):
df[is.na(grade), grade := 0]
levels(df$location) <- c("0", levels(df$location))
df[is.na(location), location := "0"]
这篇关于R中data.frame中的行重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文