告诉ifelse忽略NA的直接方法 [英] Direct way of telling ifelse to ignore NA

查看:146
本文介绍了告诉ifelse忽略NA的直接方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处所述,当ifelse(test, yes, no)中的测试条件为NA时,评估结果也为NA .因此,以下返回...

As explained here when the test condition in ifelse(test, yes, no) is NA, the evaluation is also NA. Hence the following returns...

df <- data.frame(a = c(1, 1, NA, NA, NA ,NA),
                 b = c(NA, NA, 1, 1, NA, NA),
                 c = c(rep(NA, 4), 1, 1))
ifelse(df$a==1, "a==1", 
    ifelse(df$b==1, "b==1", 
        ifelse(df$c==1, "c==1", NA)))
#[1] "a==1" "a==1" NA     NA     NA     NA    

...而不是所需的

#[1] "a==1" "a==1" "b==1" "b==1"  "c==1" "c==1" 

正如Cath所建议的,我可以通过正式指定测试条件不应包含NA来规避此问题:

As suggested by Cath, I can circumvent this problem by formally specifying that the test condition should not include NA:

ifelse(df$a==1 &  !is.na(df$a), "a==1", 
    ifelse(df$b==1 & !is.na(df$b), "b==1", 
        ifelse(df$c==1 & !is.na(df$c), "c==1", NA)))

但是,正如akrun还指出的那样,随着列数的增加,该解决方案变得相当冗长.

However, as akrun also noted, this solution becomes rather lengthy with increasing number of columns.

一种解决方法是先将所有NA替换为data.frame中不存在的值(例如,在本例中为2):

A workaround would be to first replace all NAs with a value not present in the data.frame (e.g, 2 in this case):

df_noNA <- data.frame(a = c(1, 1, 2, 2, 2 ,2),
                 b = c(2, 2, 1, 1, 2, 2),
                 c = c(rep(2, 4), 1, 1))

ifelse(df_noNA$a==1, "a==1", 
    ifelse(df_noNA$b==1, "b==1", 
        ifelse(df_noNA$c==1, "c==1", NA)))
#[1] "a==1" "a==1" "b==1" "b==1"  "c==1" "c==1" 

但是,我想知道是否有一种更直接的方式告诉ifelse忽略NAs ?还是为& !is.na编写函数最直接的方法?

However, I was wondering if there was a more direct way to tell ifelse to ignore NAs? Or is writing a function for & !is.na the most direct way?

ignorena <- function(column) {
        column ==1 & !is.na(column)
}
ifelse(ignorena(df$a), "a==1", 
    ifelse(ignorena(df$b), "b==1", 
        ifelse(ignorena(df$c), "c==1", NA)))
#[1] "a==1" "a==1" "b==1" "b==1"  "c==1" "c==1" 

推荐答案

您可以使用%in%而不是==来排序忽略NA s.

You can use %in% instead of == to sort-of ignore NAs.

ifelse(df$a %in% 1, "a==1", 
       ifelse(df$b %in% 1, "b==1", 
              ifelse(df$c %in% 1, "c==1", NA)))

不幸的是,与原始版本相比,这没有任何性能提升,而@arkun的解决方案快了约3倍.

Unfortunately, this does not give any performance gain compared to the original while @arkun's solution is about 3 times faster.

solution_original <- function(){
  ifelse(df$a==1 &  !is.na(df$a), "a==1", 
         ifelse(df$b==1 & !is.na(df$b), "b==1", 
                ifelse(df$c==1 & !is.na(df$c), "c==1", NA)))
}

solution_akrun <- function(){
  v1 <- names(df)[max.col(!is.na(df)) * NA^!rowSums(!is.na(df))]
  i1 <- !is.na(v1)
  v1[i1] <- paste0(v1[i1], "==1")
}

solution_mine <- function(x){
  ifelse(df$a %in% 1, "a==1", 
         ifelse(df$b %in% 1, "b==1", 
                ifelse(df$c %in% 1, "c==1", NA)))
}
set.seed(1)
df <- data.frame(a = sample(c(1, rep(NA, 4)), 1e6, T),
                 b = sample(c(1, rep(NA, 4)), 1e6, T),
                 c = sample(c(1, rep(NA, 4)), 1e6, T))
microbenchmark::microbenchmark(
  solution_original(),
  solution_akrun(),
  solution_mine()
)
## Unit: milliseconds
##                expr      min       lq     mean   median       uq       max neval
## solution_original() 701.9413 839.3715 845.0720 853.1960 875.6151 1051.6659   100
##    solution_akrun() 217.4129 242.5113 293.2987 253.2144 387.1598  564.3981   100
##     solution_mine() 698.7628 845.0822 848.6717 858.7892 877.9676 1006.2872   100

受此启发: R:处理TRUE,FALSE, NA和NaN

修改

在@arkun发表评论之后,我重述了基准并修改了声明.

Following the comment by @arkun, I redid the benchmark and revised the statement.

这篇关于告诉ifelse忽略NA的直接方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆