将一个数据集中的缺失值(NA)替换为另一列中匹配的值 [英] Replace missing values (NA) in one data set with values from another where columns match

查看:159
本文介绍了将一个数据集中的缺失值(NA)替换为另一列中匹配的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含3列"x","y"和"z"的数据框(datadf).缺少几个"x"值(NA). "y"和"z"是不可测量的变量.

I have a data frame (datadf) with 3 columns, 'x', 'y, and z. Several 'x' values are missing (NA). 'y' and 'z' are non measured variables.

x    y z
153  a 1
163  b 1
NA   d 1
123  a 2 
145  e 2
NA   c 2 
NA   b 1
199  a 2

我还有另一个数据框(impteddf),具有相同的三列:

I have another data frame (imputeddf) with the same three columns:

 x  y z
123 a 1
145 a 2
124 b 1
168 b 2
123 c 1
176 c 2
184 d 1
101 d 2

我希望将'datadf'中'x'中的NA替换为'imputeddf'中的值,其中'y'和'z'在两个数据集之间匹配('y'和'z'的每个组合都有它自己的值"x"填写).

I wish to replace NA in 'x' in 'datadf' with values from 'imputeddf' where 'y' and 'z' matches between the two data sets (each combo of 'y' and 'z' has its own value of 'x' to fill in).

所需结果:

x    y z
153  a 1
163  b 1
184  d 1
123  a 2 
145  e 2
176  c 2 
124  b 1
199  a 2

我正在尝试类似的事情:

I am trying things like:

finaldf <- datadf
finaldf$x <- if(datadf[!is.na(datadf$x)]){ddply(datadf, x=imputeddf$x[datadf$y == imputeddf$y & datadf$z == imputeddf$z])}else{datadf$x}

但是它不起作用.

使用估算值df填写NA的最佳方法是什么?

What is the best way for me to fill in the NA in the using my imputed value df?

推荐答案

我会这样做:

library(data.table)
setDT(DF1); setDT(DF2)

DF1[DF2, x := ifelse(is.na(x), i.x, x), on=c("y","z")]

给出

     x y z
1: 153 a 1
2: 163 b 1
3: 184 d 1
4: 123 a 2
5: 145 e 2
6: 176 c 2
7: 124 b 1
8: 199 a 2

评论.这种方法不是很好,因为它合并了DF1 whole ,而我们只需要合并is.na(x)的子集.在这里,改进看起来像(感谢@Arun):

Comments. This approach isn't so great, since it merges the whole of DF1, while we only need to merge the subset where is.na(x). Here, the improvement looks like (thanks, @Arun):

DF1[is.na(x), x := DF2[.SD, x, on=c("y", "z")]]

这种方式类似于@RHertel的答案.

This way is analogous to @RHertel's answer.

这篇关于将一个数据集中的缺失值(NA)替换为另一列中匹配的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆