根据相邻列的值替换多列的值 [英] Replace values from multiple columns based on value from adjacent column
问题描述
# Create a data frame
> df <- data.frame(a = rnorm(7), b = rnorm(7), c = rnorm(7), threshold = rnorm(7))
> df <- round(abs(df), 2)
>
> df
a b c threshold
1 1.17 0.27 1.26 0.19
2 1.41 1.57 1.23 0.97
3 0.16 0.11 0.35 1.34
4 0.03 0.04 0.10 1.50
5 0.23 1.10 2.68 0.45
6 0.99 1.36 0.17 0.30
7 0.28 0.68 1.22 0.56
>
>
# Replace values in columns a, b, and c with NA if > value in threshold
> df[1:3][df[1:3] > df[4]] <- "NA"
Error in Ops.data.frame(df[1:3], df[4]) :
‘>’ only defined for equally-sized data frames
可能有一些我无法生成的明显解决方案.如果值大于阈值"中的值,则目的是将a"、b"和c"列中的值替换为 NA.我需要逐行进行.
There could be some obvious solutions that I am incapable of producing. The intent is to replace values in columns "a", "b", and "c" with NA if the value is larger than that in "threshold". And I need to do that row-by-row.
如果我做对了,df 将如下所示:
If I had done it right, the df would look like this:
a b c threshold
1 NA NA NA 0.19
2 NA NA NA 0.97
3 0.16 0.11 0.35 1.34
4 0.03 0.04 0.10 1.50
5 0.23 NA NA 0.45
6 NA NA 0.17 0.30
7 0.28 NA NA 0.56
我也尝试过 apply() 方法,但无济于事.你能帮忙吗??
I had also tried the apply() approach but to no avail. Can you help, please??
推荐答案
你的代码的问题是使用了 df[4]
而不是 df[, 4]
>.区别在于 df[4]
返回一个包含一列的 data.frame
而 df[, 4]
返回一个向量.
The problem with your code was the usage of df[4]
instead of df[, 4]
. The difference is that df[4]
returns a data.frame
with one column and df[, 4]
returns a vector.
这就是为什么
df[1:3] > df[4]
返回
Ops.data.frame(df[1:3], df[4]) 中的错误:'>' 只为同样大小的数据帧定义
error in Ops.data.frame(df[1:3], df[4]) : ‘>’ only defined for equally-sized data frames
虽然这按预期工作
df[1:3][df[1:3] > df[, 4]] <- NA
df
# a b c threshold
#1 0.63 0.74 NA 0.78
#2 NA NA 0.04 0.07
#3 0.84 0.31 0.02 1.99
#4 NA NA NA 0.62
#5 NA NA NA 0.06
#6 NA NA NA 0.16
#7 0.49 NA 0.92 1.47
数据
set.seed(1)
df <- data.frame(a = rnorm(7), b = rnorm(7), c = rnorm(7), threshold = rnorm(7))
df <- round(abs(df), 2)
这篇关于根据相邻列的值替换多列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!