将来自其他数据帧的数据帧变量中的NA值替换为“ID” [英] Replace NA values in dataframe variable with values from other dataframe by "ID"

查看:112
本文介绍了将来自其他数据帧的数据帧变量中的NA值替换为“ID”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如果在数据框中替换 NA 值更简洁的方法,那么我下面所做的更改。下面的代码似乎比我认为可能在r中更长。例如,我不知道可能会更简洁的一些软件包/工具。

I would like to know if there is a more concise way to replace NA values for a variable in a dataframe than what I did below. The code below seems to longer than what I think might be possible in r. For example, am unaware of some package/tool that might do this more succinctly.

有没有办法替换或合并值只有当它们是 NA ?在使用 all.x = T 合并两个数据框后,我有一些 NA 值,我想替换那些来自另一个数据帧的信息使用通用变量来链接替换。

Is there a way to replace, or merge values only if they are NA? After merging two dataframes using all.x = T I have some NA values, I'd like to replace those with information from another dataframe using a common variable to link the replacement.

# get dataframes
breaks <- structure(list(Break = 1:11, Value = c(2L, 13L, 7L, 9L, 40L, 
21L, 10L, 37L, 7L, 26L, 42L)), .Names = c("Break", "Value"), class = "data.frame", row.names = c(NA, 
-11L))

fsites <- structure(list(Site = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L), Plot = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 0L, 
1L, 2L, 3L, 4L, 5L), Break = c(1L, 5L, 7L, 8L, 11L, 1L, 6L, 11L, 
1L, 4L, 6L, 8L, 9L, 11L)), .Names = c("Site", "Plot", "Break"
), class = "data.frame", row.names = c(NA, -14L))

bps <- structure(list(Site = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L), Plot = c(0L, 1L, 2L, 3L, 1L, 2L, 0L, 1L, 2L, 3L, 4L), Value = c(0.393309653, 
0.12465733, 0.27380161, 0.027288989, 0.439712533, 0.289724079, 
0.036429062, 0.577460008, 0.820375917, 0.323217357, 0.28637503
)), .Names = c("Site", "Plot", "Value"), class = "data.frame", row.names = c(NA, 
-11L))

# merge fsites and bps
df1 <- merge(fsites, bps, by=c("Site", "Plot"), all.x=T)

# merge df1 and breaks to get values to eventually replace the NA values in 
# df1$Values.x, here "Break" is the ID by which to replace the NA values
df2 <- merge(df1, breaks, by=c("Break"))

# Create a new column 'Value' that uses Value.x, unless NA, then Value.y
df3 <- df2
df3$Value <- df2$Value.x
df2.na <- is.na(df2$Value.x)
df3$Value[df2.na] <- df2$Value.y[df2.na]

# get rid of unnecessary columns
cols <- c(1:3,6)
df4 <- df3[,cols]


推荐答案

在只有( break fsites bps 和) df1 around:

At the stage where there is only (breaks, fsites, bps and) df1 around:

df1$Value <- ifelse(is.na(df1$Value), 
                            breaks$Value[match(df1$Break, breaks$Break)], df1$Value)

#> df1
#   Site Plot Break       Value
#1     1    0     1  0.39330965
#2     1    1     5  0.12465733
#3     1    2     7  0.27380161
#4     1    3     8  0.02728899
#5     1    4    11 42.00000000
#6     2    0     1  2.00000000
#7     2    1     6  0.43971253
#8     2    2    11  0.28972408
#9     3    0     1  0.03642906
#10    3    1     4  0.57746001
#11    3    2     6  0.82037592
#12    3    3     8  0.32321736
#13    3    4     9  0.28637503
#14    3    5    11 42.00000000

#just to test with your `df4`
> sort(df1$Value) == sort(df4$Value)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

这篇关于将来自其他数据帧的数据帧变量中的NA值替换为“ID”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆