R:查找值是否在其行中任何其他值的一定百分比之内 [英] R: Find if value is within a certain percentage of any other value in its row

查看:130
本文介绍了R:查找值是否在其行中任何其他值的一定百分比之内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个值的数据框,对于数据帧中的每个值,我想确定它是否在其行中任何其他值的10%之内。我想做一般这样做,因为我不知道我将有多少列和列的名称。



如果行中的所有其他值都为NA,则某些值为NA,我想返回TRUE。对于NA的实际值,我想返回FALSE。这些值都是正数,但可以是0。



例如说我有以下数据框

  dataDF < -  data.frame(
a = c(100,250,NA,700,0),
b = c(105,300,280,NA,0) ,
c = c(200,400,280,NA,0)

在第一行中,我们有一个= 100,b = 105和c = 200. a和b在彼此的10%以内,所以我们对于这两个都是TRUE,c不在a或b这样会是FALSE。



在第二行,没有值在10%以内,所有这些都将是FALSE



在第三行b和c是相等的,所以是TRUE,a是NA,所以是FALSE。



在第四行中,我们只有一个值所以它返回为TRUE,b和c为FALSE



在最后一行中,所有值都相同,所以我们将为所有的



所以我的输出结果d)

  data.frame(
a = c(TRUE,FALSE,FALSE,TRUE,TRUE),
b = c(TRUE,FALSE,TRUE,FALSE,TRUE),
c = c(FALSE,FALSE,TRUE,FALSE,TRUE)

/ pre>

我如何计算百分比差异并不重要,但他们的方式我会做的是将绝对差异除以2值,所以我得到相同的值,无论我看哪一种。例如,为了计算100到105之间的百分比差异,它将是:



 code> abs(100  -  105)/((100 + 105)/ 2)= 5 / 102.5 = 0.0488 

任何想法,最快捷,最完美的做法将不胜感激。



谢谢

解决方案

定义一个函数,将其应用于您的data.frame的每一行:

  fun<  -  function(vec)
{
n = length(vec)

if(all(is.na(vec)))
return (FALSE,n))

noNA = vec [!is.na(vec)]

if(length(unique(noNA))== 1)
返回(!is.na(vec))

res = rep(FALSE,n)

(i in 1:n)
if(any (abs(vec [i] -vec [-i])< = vec [-i] * 0.1,na.rm = TRUE))
res [i] = TRUE

res
}

output = data.frame(t(apply(dataDF,1,fun)))
名称(输出)=名称(dataDF)
输出

给是想要的结果:

 #abc 
#1 TRUE TRUE FALSE
#2 FALSE FALSE FALSE
#3 FALSE TRUE TRUE
#4 TRUE FALSE FALSE
#5 TRUE TRUE TRUE


I have a dataframe of values and for each value in the dataframe I want to determine if it is within say 10% of any other value in its row. I want to do this generically as I do not know how many columns I will have nor the names of the columns.

Some values are NA, if all other values in the row are NA I want to return TRUE. For the actual values which are NA I want to return FALSE. The values are all positive but can be 0.

For example say I have the follwoing dataframe

dataDF <- data.frame(
                     a = c(100, 250,  NA, 700,   0),
                     b = c(105, 300, 280,  NA,   0),
                     c = c(200, 400, 280,  NA,   0)
                     )

In the first row we have a = 100, b = 105 and c = 200. a and b are within 10% of each other so we would have TRUE for both of those, c is not within 10% of either a or b so would be FALSE.

In the second row no values are within 10% of each other so all would be FALSE

In the third row b and c are equal so are TRUE, a is NA so is FALSE.

In the fourth row we only have a value for a so it is returned as TRUE, b and c are FALSE

In the final row all values are the same, so we would have TRUE for all

So my output would be

data.frame(
           a = c( TRUE, FALSE, FALSE,  TRUE, TRUE),
           b = c( TRUE, FALSE,  TRUE, FALSE, TRUE),
           c = c(FALSE, FALSE,  TRUE, FALSE, TRUE)
          )

How I calculate the percentage difference doesn't really matter but they way I was going to do it would be to divide the absolute difference by the average of the 2 values so that I get the same value whichever way I look at it.

So for example to calculate the percentage difference between 100 and 105 it would be:

abs(100 - 105)/((100 + 105)/2) = 5/102.5 = 0.0488

Any ideas on the quickest and neatest way of doing this would be appreciated.

Thanks

解决方案

Define a function an apply it on each row of your data.frame:

fun <- function(vec)
{
  n = length(vec)

  if(all(is.na(vec)))
    return(rep(FALSE,n))

  noNA = vec[!is.na(vec)]

  if(length(unique(noNA))==1)
    return(!is.na(vec))

  res = rep(FALSE, n)

  for(i in 1:n)
    if(any(abs(vec[i]-vec[-i])<=vec[-i]*0.1, na.rm = TRUE))
      res[i] = TRUE

  res
}

output=data.frame(t(apply(dataDF,1,fun)))
names(output) = names(dataDF)
output

Gives the wanted result:

#      a     b     c
#1  TRUE  TRUE FALSE
#2 FALSE FALSE FALSE
#3 FALSE  TRUE  TRUE
#4  TRUE FALSE FALSE
#5  TRUE  TRUE  TRUE

这篇关于R:查找值是否在其行中任何其他值的一定百分比之内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆