R:查找值是否在其行中任何其他值的一定百分比之内 [英] R: Find if value is within a certain percentage of any other value in its row
问题描述
如果行中的所有其他值都为NA,则某些值为NA,我想返回TRUE。对于NA的实际值,我想返回FALSE。这些值都是正数,但可以是0。
例如说我有以下数据框
dataDF < - data.frame(
a = c(100,250,NA,700,0),
b = c(105,300,280,NA,0) ,
c = c(200,400,280,NA,0)
)
在第一行中,我们有一个= 100,b = 105和c = 200. a和b在彼此的10%以内,所以我们对于这两个都是TRUE,c不在a或b这样会是FALSE。
在第二行,没有值在10%以内,所有这些都将是FALSE
在第三行b和c是相等的,所以是TRUE,a是NA,所以是FALSE。
在第四行中,我们只有一个值所以它返回为TRUE,b和c为FALSE
在最后一行中,所有值都相同,所以我们将为所有的
所以我的输出结果d)
data.frame(
/ pre>
a = c(TRUE,FALSE,FALSE,TRUE,TRUE),
b = c(TRUE,FALSE,TRUE,FALSE,TRUE),
c = c(FALSE,FALSE,TRUE,FALSE,TRUE)
)
我如何计算百分比差异并不重要,但他们的方式我会做的是将绝对差异除以2值,所以我得到相同的值,无论我看哪一种。例如,为了计算100到105之间的百分比差异,它将是:
code> abs(100 - 105)/((100 + 105)/ 2)= 5 / 102.5 = 0.0488
任何想法,最快捷,最完美的做法将不胜感激。
谢谢
解决方案定义一个函数,将其应用于您的data.frame的每一行:
fun< - function(vec)
{
n = length(vec)
if(all(is.na(vec)))
return (FALSE,n))
noNA = vec [!is.na(vec)]
if(length(unique(noNA))== 1)
返回(!is.na(vec))
res = rep(FALSE,n)
(i in 1:n)
if(any (abs(vec [i] -vec [-i])< = vec [-i] * 0.1,na.rm = TRUE))
res [i] = TRUE
res
}
output = data.frame(t(apply(dataDF,1,fun)))
名称(输出)=名称(dataDF)
输出
给是想要的结果:
#abc
#1 TRUE TRUE FALSE
#2 FALSE FALSE FALSE
#3 FALSE TRUE TRUE
#4 TRUE FALSE FALSE
#5 TRUE TRUE TRUE
I have a dataframe of values and for each value in the dataframe I want to determine if it is within say 10% of any other value in its row. I want to do this generically as I do not know how many columns I will have nor the names of the columns.
Some values are NA, if all other values in the row are NA I want to return TRUE. For the actual values which are NA I want to return FALSE. The values are all positive but can be 0.
For example say I have the follwoing dataframe
dataDF <- data.frame( a = c(100, 250, NA, 700, 0), b = c(105, 300, 280, NA, 0), c = c(200, 400, 280, NA, 0) )
In the first row we have a = 100, b = 105 and c = 200. a and b are within 10% of each other so we would have TRUE for both of those, c is not within 10% of either a or b so would be FALSE.
In the second row no values are within 10% of each other so all would be FALSE
In the third row b and c are equal so are TRUE, a is NA so is FALSE.
In the fourth row we only have a value for a so it is returned as TRUE, b and c are FALSE
In the final row all values are the same, so we would have TRUE for all
So my output would be
data.frame( a = c( TRUE, FALSE, FALSE, TRUE, TRUE), b = c( TRUE, FALSE, TRUE, FALSE, TRUE), c = c(FALSE, FALSE, TRUE, FALSE, TRUE) )
How I calculate the percentage difference doesn't really matter but they way I was going to do it would be to divide the absolute difference by the average of the 2 values so that I get the same value whichever way I look at it.
So for example to calculate the percentage difference between 100 and 105 it would be:
abs(100 - 105)/((100 + 105)/2) = 5/102.5 = 0.0488
Any ideas on the quickest and neatest way of doing this would be appreciated.
Thanks
解决方案Define a function an apply it on each row of your data.frame:
fun <- function(vec) { n = length(vec) if(all(is.na(vec))) return(rep(FALSE,n)) noNA = vec[!is.na(vec)] if(length(unique(noNA))==1) return(!is.na(vec)) res = rep(FALSE, n) for(i in 1:n) if(any(abs(vec[i]-vec[-i])<=vec[-i]*0.1, na.rm = TRUE)) res[i] = TRUE res } output=data.frame(t(apply(dataDF,1,fun))) names(output) = names(dataDF) output
Gives the wanted result:
# a b c #1 TRUE TRUE FALSE #2 FALSE FALSE FALSE #3 FALSE TRUE TRUE #4 TRUE FALSE FALSE #5 TRUE TRUE TRUE
这篇关于R:查找值是否在其行中任何其他值的一定百分比之内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!