使用ifelse()通过引用不同长度的另一个数据帧来替换一个数据帧中的NAs [英] Using ifelse() to replace NAs in one data frame by referencing another data frame of different length

查看:106
本文介绍了使用ifelse()通过引用不同长度的另一个数据帧来替换一个数据帧中的NAs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一个关于stackoverflow的问题,所以如果已经得到回答,我的道歉,请让我知道在哪里看。



两个帖子,并认为他们可能回答我的问题,虽然我很努力地看到如何:



1)数据框架中的值的条件替换
2)创建一个功能来替换一个data.frame与另一个值



就是说,我试图通过引用另一个不同的数据帧来替换一个数据帧中的NAs(较短)长度,并从B列中拉取替换值,其中每个数据框中的列A的值匹配。



我已经修改了下面的数据,为simp虽然概念在实际数据中是一样的, FYI,在实际的第二个数据框中,列A中也没有重复。



这是第一个数据帧(df1):

 > df1 
BCA
1 NA 2012-10-01 0
2 NA 2012-10-01 5
3 4 2012-10-01 10
4 NA 2012- 10-01 15
5 NA 2012-10-01 20
6 20 2012-10-01 25
7 NA 2012-10-01 0
8 NA 2012-10- 01 5
9 5 2012-10-01 10
10 5 2012-10-01 15

> str(df1)
'data.frame':10 obs。的3个变量:
$ B:num NA NA 4 NA NA 20 NA NA 5 5
$ C:因子w / 1级2012-10-01:1 1 1 1 1 1 1 1 1 1
$ A:num 0 5 10 15 20 25 0 5 10 15

第二个数据框(df2)。

 > df2 
AB
1 0 1.7169811
2 5 0.3396226
3 10 0.1320755
4 15 0.1509434
5 20 0.0754717
6 25 2.0943396

> str(df2)
'data.frame':6 obs。的2个变量:
$ A:int 0 5 10 15 20 25
$ B:num 1.717 0.3396 0.1321 0.1509 0.0755 ...

我觉得我和以下代码非常接近:

 > ; ifelse(is.na(df1 $ B)== TRUE,df2 $ B [df2 $ A == df1 $ A],df1 $ B)
[1] 1.7169811 0.3396226 4.0000000 0.1509434 0.0754717 20.0000000 NA NA
[9] 5.0000000 5.0000000
警告信息:
在df2 $ A == df1 $ A:
更长的对象长度不是较短对象长度的倍数

显然,我希望第7和第8个输出元素是1.7169811和0.3396226,而不是NAs。 。谢谢你提前求助,再次感谢你的耐心!

解决方案

尝试以下代码,该代码将收到您的原始语句,并在 ifelse的 TRUE 参数中进行小调整功能:

 > df1 $ B<  -  ifelse(is.na(df1 $ B)== TRUE,df2 $ B [df2 $ A%in%df1 $ A],df1 $ B)
#Switched'==' '%in%'--- ^
> df1
BCA
1 1.7169811 2012-10-01 0
2 0.3396226 2012-10-01 5
3 4.0000000 2012-10-01 10
4 0.1509434 2012- 10-01 15
5 0.0754717 2012-10-01 20
6 20.0000000 2012-10-01 25
7 1.7169811 2012-10-01 0
8 0.3396226 2012-10- 01 5
9 5.0000000 2012-10-01 10
10 5.0000000 2012-10-01 15


This is my first question on stackoverflow, so if it's already been answered, my apologies, and please let me know where to look.

I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how:

1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another

With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match.

I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A".

Here's the first data frame (df1):

> df1
    B          C  A
1  NA 2012-10-01  0
2  NA 2012-10-01  5
3   4 2012-10-01 10
4  NA 2012-10-01 15
5  NA 2012-10-01 20
6  20 2012-10-01 25
7  NA 2012-10-01  0
8  NA 2012-10-01  5
9   5 2012-10-01 10
10  5 2012-10-01 15

> str(df1)
'data.frame':   10 obs. of  3 variables:
 $ B: num  NA NA 4 NA NA 20 NA NA 5 5
 $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1
 $ A: num  0 5 10 15 20 25 0 5 10 15

And the second data frame (df2).

> df2
   A         B
1  0 1.7169811
2  5 0.3396226
3 10 0.1320755
4 15 0.1509434
5 20 0.0754717
6 25 2.0943396

> str(df2)
'data.frame':   6 obs. of  2 variables:
 $ A: int  0 5 10 15 20 25
 $ B: num  1.717 0.3396 0.1321 0.1509 0.0755 ...

I think I'm pretty close with the following code:

> ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B)
 [1]  1.7169811  0.3396226  4.0000000  0.1509434  0.0754717 20.0000000         NA         NA
 [9]  5.0000000  5.0000000
Warning message:
In df2$A == df1$A :
  longer object length is not a multiple of shorter object length

Obviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . .

Thanks, in advance, for any help, and, once again, thanks for your patience!

解决方案

Try the following code which takes your original statement and makes a small tweak in the TRUE argument of the ifelse function:

> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B)   
#                         Switched '==' to '%in%' ---^
> df1
            B          C  A
1   1.7169811 2012-10-01  0
2   0.3396226 2012-10-01  5
3   4.0000000 2012-10-01 10
4   0.1509434 2012-10-01 15
5   0.0754717 2012-10-01 20
6  20.0000000 2012-10-01 25
7   1.7169811 2012-10-01  0
8   0.3396226 2012-10-01  5
9   5.0000000 2012-10-01 10
10  5.0000000 2012-10-01 15

这篇关于使用ifelse()通过引用不同长度的另一个数据帧来替换一个数据帧中的NAs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆