如果缺少,则使用另一个中的值更新现有的data.frame [英] Update existing data.frame with values from another one if missing
问题描述
我正在以下代码的R(首选base和data.table)中寻找(1)名称和(2)一种(更清洁的)方法.
I'm looking for the (1) name and (2) a (cleaner) method in R (base and data.table preferred) of the following.
输入
> d1
id x y
1 1 1 NA
2 2 NA 3
3 3 4 NA
> d2
id x y z
1 4 NA 30 a
2 3 20 2 b
3 2 14 NA c
4 1 15 97 d
(请注意,实际数据帧有数百列)
(note that the actual data.frames have hundreds of columns)
预期产量:
> d1
id x y z
1 1 1 97 d
2 2 14 3 c
3 3 4 2 b
数据和当前解决方案:
d1 <- data.frame(id = 1:3, x = c(1, NA, 4), y = c(NA, 3, NA))
d2 <- data.frame(id = 4:1, x = c(NA, 20, 14, 15), y = c(30, 2, NA, 97), z = letters[1:4])
for (col in setdiff(names(d1), "id")) {
# If missing look in d2
missing <- is.na(d1[[col]])
d1[missing, col] <- d2[match(d1$id[missing], d2$id), col]
}
for (col in setdiff(names(d2), names(d1))) {
# If column missing then add
d1[[col]] <- d2[match(d1$id, d2$id), col]
}
PS:
就像以前曾经问过这个问题,但我缺乏搜索它的词汇.
Likely this questions has been asked before but I'm lacking in vocabulary to search it.
推荐答案
假设您正在使用2个data.frame,这是一个基本解决方案
Assuming you are working with 2 data.frames, here is a base solution
#expand d1 to have the same columns as d2
d <- merge(d1, d2[, c("id", setdiff(names(d2), names(d1))), drop=FALSE],
by="id", all.x=TRUE, all.y=FALSE)
#make sure that d2 also have same number of columns as d1
d2 <- merge(d2, d1[, c("id", setdiff(names(d1), names(d2))), drop=FALSE],
by="id", all.x=TRUE, all.y=FALSE)
#align rows and columns to match those in d1
mask <- d2[match(d1$id, d2$id), names(d)]
#replace NAs with those mask
replace(d, is.na(d), mask[is.na(d)])
如果您不介意,我们可以将您的问题改写成一个通用的矩阵问题(即任意数量的矩阵,列,行),好像以前从未有人问过.
If you dont mind, we can rewrite your question into a general matrix-coalesce question (i.e. any number of matrices, columns, rows) which seems like it has not been asked before.
另一种基本的R解决方案是来自
Another base R solution is a hack of coalesce1a
from How to implement coalesce efficiently in R
coalesce.mat <- function(...) {
ans <- ..1
for (elt in list(...)[-1]) {
rn <- match(ans$id, elt$id)
ans[is.na(ans)] <- elt[rn, names(ans)][is.na(ans)]
}
ans
}
allcols <- Reduce(union, lapply(list(d1, d2), names))
do.call(coalesce.mat,
lapply(list(d1, d2), function(x) {
x[, setdiff(allcols, names(x))] <- NA
x
}))
edit:
使用中的coalesce1a
的可能的data.table
解决方案马丁·摩根(Martin Morgan)如何在R 中有效地实现合并.
a possible data.table
solution using coalesce1a
from How to implement coalesce efficiently in R by Martin Morgan.
coalesce1a <- function(...) {
ans <- ..1
for (elt in list(...)[-1]) {
i <- which(is.na(ans))
ans[i] <- elt[i]
}
ans
}
setDT(d1)
setDT(d2)
#melt into long formats and full outer join the 2
mdt <- merge(melt(d1, id.vars="id"), melt(d2, id.vars="id"), by=c("id","variable"), all=TRUE)
#perform a coalesce on vectors
mdt[, value := do.call(coalesce1a, .SD), .SDcols=grep("value", names(mdt), value=TRUE)]
#pivot into original format and subset to those in d1
dcast.data.table(mdt, id ~ variable, value.var="value")[
d1, .SD, on=.(id)]
这篇关于如果缺少,则使用另一个中的值更新现有的data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!