告诉ifelse忽略NA的直接方法 [英] Direct way of telling ifelse to ignore NA
问题描述
如此处所述,当ifelse(test, yes, no)
中的测试条件为NA
时,评估结果也为NA
.因此,以下返回...
As explained here when the test condition in ifelse(test, yes, no)
is NA
, the evaluation is also NA
. Hence the following returns...
df <- data.frame(a = c(1, 1, NA, NA, NA ,NA),
b = c(NA, NA, 1, 1, NA, NA),
c = c(rep(NA, 4), 1, 1))
ifelse(df$a==1, "a==1",
ifelse(df$b==1, "b==1",
ifelse(df$c==1, "c==1", NA)))
#[1] "a==1" "a==1" NA NA NA NA
...而不是所需的
#[1] "a==1" "a==1" "b==1" "b==1" "c==1" "c==1"
正如Cath所建议的,我可以通过正式指定测试条件不应包含NA来规避此问题:
As suggested by Cath, I can circumvent this problem by formally specifying that the test condition should not include NA:
ifelse(df$a==1 & !is.na(df$a), "a==1",
ifelse(df$b==1 & !is.na(df$b), "b==1",
ifelse(df$c==1 & !is.na(df$c), "c==1", NA)))
但是,正如akrun还指出的那样,随着列数的增加,该解决方案变得相当冗长.
However, as akrun also noted, this solution becomes rather lengthy with increasing number of columns.
一种解决方法是先将所有NA
替换为data.frame中不存在的值(例如,在本例中为2):
A workaround would be to first replace all NA
s with a value not present in the data.frame (e.g, 2 in this case):
df_noNA <- data.frame(a = c(1, 1, 2, 2, 2 ,2),
b = c(2, 2, 1, 1, 2, 2),
c = c(rep(2, 4), 1, 1))
ifelse(df_noNA$a==1, "a==1",
ifelse(df_noNA$b==1, "b==1",
ifelse(df_noNA$c==1, "c==1", NA)))
#[1] "a==1" "a==1" "b==1" "b==1" "c==1" "c==1"
但是,我想知道是否有一种更直接的方式告诉ifelse
忽略NAs ?还是为& !is.na
编写函数最直接的方法?
However, I was wondering if there was a more direct way to tell ifelse
to ignore NAs? Or is writing a function for & !is.na
the most direct way?
ignorena <- function(column) {
column ==1 & !is.na(column)
}
ifelse(ignorena(df$a), "a==1",
ifelse(ignorena(df$b), "b==1",
ifelse(ignorena(df$c), "c==1", NA)))
#[1] "a==1" "a==1" "b==1" "b==1" "c==1" "c==1"
推荐答案
您可以使用%in%
而不是==
来排序忽略NA
s.
You can use %in%
instead of ==
to sort-of ignore NA
s.
ifelse(df$a %in% 1, "a==1",
ifelse(df$b %in% 1, "b==1",
ifelse(df$c %in% 1, "c==1", NA)))
不幸的是,与原始版本相比,这没有任何性能提升,而@arkun的解决方案快了约3倍.
Unfortunately, this does not give any performance gain compared to the original while @arkun's solution is about 3 times faster.
solution_original <- function(){
ifelse(df$a==1 & !is.na(df$a), "a==1",
ifelse(df$b==1 & !is.na(df$b), "b==1",
ifelse(df$c==1 & !is.na(df$c), "c==1", NA)))
}
solution_akrun <- function(){
v1 <- names(df)[max.col(!is.na(df)) * NA^!rowSums(!is.na(df))]
i1 <- !is.na(v1)
v1[i1] <- paste0(v1[i1], "==1")
}
solution_mine <- function(x){
ifelse(df$a %in% 1, "a==1",
ifelse(df$b %in% 1, "b==1",
ifelse(df$c %in% 1, "c==1", NA)))
}
set.seed(1)
df <- data.frame(a = sample(c(1, rep(NA, 4)), 1e6, T),
b = sample(c(1, rep(NA, 4)), 1e6, T),
c = sample(c(1, rep(NA, 4)), 1e6, T))
microbenchmark::microbenchmark(
solution_original(),
solution_akrun(),
solution_mine()
)
## Unit: milliseconds
## expr min lq mean median uq max neval
## solution_original() 701.9413 839.3715 845.0720 853.1960 875.6151 1051.6659 100
## solution_akrun() 217.4129 242.5113 293.2987 253.2144 387.1598 564.3981 100
## solution_mine() 698.7628 845.0822 848.6717 858.7892 877.9676 1006.2872 100
受此启发: R:处理TRUE,FALSE, NA和NaN
修改
在@arkun发表评论之后,我重述了基准并修改了声明.
Following the comment by @arkun, I redid the benchmark and revised the statement.
这篇关于告诉ifelse忽略NA的直接方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!