用两个变量折叠R中的重复行 [英] Collapsing duplicate rows in R by two variables
问题描述
我的数据集中有部分重复的行。这些行在两个变量上匹配,然后对于其余变量具有一些NA。如果我可以合并这对部分重复的行,那么该行将有一个完整的情况。
I have partially duplicated rows in my data set. These rows match on two variables and then for the rest of the variables, have some NAs. If I can combine these pairs of partially duplicated rows, I would have a complete case for that row.
如何基于相似的值组合数据集中的行两个变量,从而替换每个单独行中的NA,留下完整的一行?
How can I combine rows in my data set based on similar values for two variables, thereby replacing the NAs in each separate row, leaving one complete row?
a <- (c(1, 1, 1, 1))
b <- (c(1, 1, 3, 3))
c <- (c(NA, 0, NA, NA))
d <- (c(0, NA, 0, NA))
y <- data.frame(a, b, c, d)
head(y)
a1 <- (c(1, 1))
b1 <- (c(1, 3))
c1 <- (c(0, NA))
d1 <- (c(0, 0))
z <- data.frame(a1, b1, c1, d1)
head(z)
推荐答案
我们可以使用 data.table
。将'data.frame'转换为'data.table'( setDT(y)
),按'a','b'分组,遍历Data.table的子集( .SD
)并获取非NA元素
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(y)
), grouped by 'a', 'b', loop throughthe Subset of Data.table (.SD
) and get the non-NA elements
library(data.table)
setDT(y)[, lapply(.SD, function(x) x[!is.na(x)]) , .(a,b)]
# a b c d
#1: 1 1 0 0
#2: 1 3 NA 0
这篇关于用两个变量折叠R中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!