如果缺少,则使用另一个中的值更新现有的data.frame [英] Update existing data.frame with values from another one if missing

查看:136
本文介绍了如果缺少,则使用另一个中的值更新现有的data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在以下代码的R(首选base和data.table)中寻找(1)名称和(2)一种(更清洁的)方法.

I'm looking for the (1) name and (2) a (cleaner) method in R (base and data.table preferred) of the following.

输入

> d1
  id  x  y
1  1  1 NA
2  2 NA  3
3  3  4 NA
> d2
  id  x  y z
1  4 NA 30 a
2  3 20  2 b
3  2 14 NA c
4  1 15 97 d

(请注意,实际数据帧有数百列)

(note that the actual data.frames have hundreds of columns)

预期产量:

> d1
  id  x  y z
1  1  1 97 d
2  2 14  3 c
3  3  4  2 b

数据和当前解决方案:

d1 <- data.frame(id = 1:3, x = c(1, NA, 4), y = c(NA, 3, NA))
d2 <- data.frame(id = 4:1, x = c(NA, 20, 14, 15), y = c(30, 2, NA, 97), z = letters[1:4])

for (col in setdiff(names(d1), "id")) {
  # If missing look in d2
  missing <- is.na(d1[[col]])
  d1[missing, col] <- d2[match(d1$id[missing], d2$id), col]
} 
for (col in setdiff(names(d2), names(d1))) {
  # If column missing then add
  d1[[col]] <- d2[match(d1$id, d2$id), col]
}

PS:

就像以前曾经问过这个问题,但我缺乏搜索它的词汇.

Likely this questions has been asked before but I'm lacking in vocabulary to search it.

推荐答案

假设您正在使用2个data.frame,这是一个基本解决方案

Assuming you are working with 2 data.frames, here is a base solution

#expand d1 to have the same columns as d2
d <- merge(d1, d2[, c("id", setdiff(names(d2), names(d1))), drop=FALSE], 
    by="id", all.x=TRUE, all.y=FALSE)

#make sure that d2 also have same number of columns as d1
d2 <- merge(d2, d1[, c("id", setdiff(names(d1), names(d2))), drop=FALSE], 
    by="id", all.x=TRUE, all.y=FALSE)

#align rows and columns to match those in d1
mask <- d2[match(d1$id, d2$id), names(d)]

#replace NAs with those mask
replace(d, is.na(d), mask[is.na(d)])

如果您不介意,我们可以将您的问题改写成一个通用的矩阵问题(即任意数量的矩阵,列,行),好像以前从未有人问过.

If you dont mind, we can rewrite your question into a general matrix-coalesce question (i.e. any number of matrices, columns, rows) which seems like it has not been asked before.

另一种基本的R解决方案是来自coalesce1ahack. >如何在R中有效实施合并

Another base R solution is a hack of coalesce1a from How to implement coalesce efficiently in R

coalesce.mat <- function(...) {
    ans <- ..1  
    for (elt in list(...)[-1]) {
        rn <- match(ans$id, elt$id)
        ans[is.na(ans)] <- elt[rn, names(ans)][is.na(ans)]
    }
    ans         
}

allcols <- Reduce(union, lapply(list(d1, d2), names))
do.call(coalesce.mat, 
    lapply(list(d1, d2), function(x) {
        x[, setdiff(allcols, names(x))] <- NA
        x 
    }))



edit:

使用中的coalesce1a的可能的data.table解决方案马丁·摩根(Martin Morgan)如何在R 中有效地实现合并.

a possible data.table solution using coalesce1a from How to implement coalesce efficiently in R by Martin Morgan.

coalesce1a <- function(...) {
    ans <- ..1
    for (elt in list(...)[-1]) {
        i <- which(is.na(ans))
        ans[i] <- elt[i]
    }
    ans
}

setDT(d1)
setDT(d2)

#melt into long formats and full outer join the 2
mdt <- merge(melt(d1, id.vars="id"), melt(d2, id.vars="id"), by=c("id","variable"), all=TRUE)

#perform a coalesce on vectors
mdt[, value := do.call(coalesce1a, .SD), .SDcols=grep("value", names(mdt), value=TRUE)]

#pivot into original format and subset to those in d1
dcast.data.table(mdt, id ~ variable, value.var="value")[
    d1, .SD, on=.(id)]

这篇关于如果缺少,则使用另一个中的值更新现有的data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆