dplyr错误:组合group_by,mutate和ifelse时出现奇怪的问题。是bug吗 [英] dplyr error: strange issue when combining group_by, mutate and ifelse. Is it a bug?
问题描述
> df1
crawl.id group.id hits.diff
1 1 1 NA
2 1 2 NA
3 2 2 0
4 1 3 NA
5 1 3 NA
6 1 3 NA
当我使用它时, p>
库(dplyr)
df1%>%
group_by(group.id)%>%
mutate(hits.consumed = ifelse(hits.diff <= 0,-hits.diff,0))
由于某些原因,我得到
错误:不兼容的类型,期望一个逻辑向量**
但是,删除 group_by()
或 ifelse
一切正常工作:
df1%>%
mutate .consumed = ifelse(hits.diff <= 0,-hits.diff,0))
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
df1%>%
group_by(group.id)%>%
mutate(hits.consumed = -hits.diff)
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
是错误还是功能?任何人都可以复制这个吗?
关于group_by,mutate和ifelse的特定组合使它失败的特别之处是什么?
我自己的研究让我在这里:
https://github.com/hadley/dplyr/issues/464
这表明现在应该修复它。
这里是 dput(df1)
:
结构(list(crawl.id = c(1,1,2,1,1,1),group.id =
2L,2L,3L,3L,3L),.Label = c(1,2,3),class =factor),
hits.diff = c NA,NA,0,NA,NA,NA)),.Names = c(crawl.id,
group.id,hits.diff),row.names = c(NA, -6L),class =data.frame)
将其全部包含在 as.numeric
中以强制输出格式,以便 NA
s,这些 logical
默认情况下,不要覆盖输出变量的类:
df1% >%
group_by(group.id)%>%
mutate(hits.consumed = as.numeric(ifelse(hits.diff <= 0,-hits.diff,0)))
#crawl.id group.id hits.diff hits.consumed
#1 1 1 NA NA
#2 1 2 NA NA
#3 2 2 0 0
#4 1 3 NA NA
#5 1 3 NA NA
#6 1 3 NA NA
很确定这是与以下相同的问题:自定义总和在dplyr中的函数返回不一致的结果,因为这个结果表明:
out< - df1 [1:2, ]%>%mutate(hits.consumed = ifelse(hits.diff <= 0,-hits.diff,0))
class(out $ hits.consumed)
#[1]逻辑
out< - df1 [1:3,]%>%mutate(hits.consumed = ifelse(hits.diff < = 0,-hits.diff,0))
class(out $ hits.consumed)
#[1]numeric
I am having strange issues with dplyr and combination of group_by, mutate and ifelse. Consider the following data.frame
> df1
crawl.id group.id hits.diff
1 1 1 NA
2 1 2 NA
3 2 2 0
4 1 3 NA
5 1 3 NA
6 1 3 NA
When I use it the following code
library(dplyr)
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
For some reason I get
Error: incompatible types, expecting a logical vector**
However, removing either group_by()
or ifelse
everything works as expected:
df1 %>%
mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
df1 %>%
group_by( group.id ) %>%
mutate( hits.consumed = -hits.diff )
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
Is it a bug or a feature? Can anyone replicate this? What's so special about that specific combination of group_by, mutate and ifelse that makes it fail?
My own research led me here: https://github.com/hadley/dplyr/issues/464 which suggests that it should be fixed by now.
Here is dput(df1)
:
structure(list(crawl.id = c(1, 1, 2, 1, 1, 1), group.id = structure(c(1L,
2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"),
hits.diff = c(NA, NA, 0, NA, NA, NA)), .Names = c("crawl.id",
"group.id", "hits.diff"), row.names = c(NA, -6L), class = "data.frame")
Wrap it all in as.numeric
to force the output format so the NA
s, which are logical
by default, don't override the class of the output variable:
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )
# crawl.id group.id hits.diff hits.consumed
#1 1 1 NA NA
#2 1 2 NA NA
#3 2 2 0 0
#4 1 3 NA NA
#5 1 3 NA NA
#6 1 3 NA NA
Pretty sure this is the same issue as here: Custom sum function in dplyr returns inconsistent results , as this result suggests:
out <- df1[1:2,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "logical"
out <- df1[1:3,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "numeric"
这篇关于dplyr错误:组合group_by,mutate和ifelse时出现奇怪的问题。是bug吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!