长变量名称在dplyr中失败 [英] Long variable name fails in dplyr
问题描述
长度超过39个字符的字符串在dplyr中失败,返回错误:Error:index out of bounds。
我错过了什么,还是这个错误?
40个字符不起作用:
库(dplyr)
名称(iris)[5] < - vvv_5vvv10vvv15vvv20vvvvvvvvvvvvvvvvvv
iris%>%dplyr :: group_by(vvv_5vvv10vvv15vvv20vvvvvvvvvvvvvvvvvvv)%>%
dplyr :: summarize(n())
给我错误:
错误:索引出边界
< h2> 39个字符作品:
名称(iris)[5]< - vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vv39
iris%>%dplyr :: group_by(vvv_5vvv10vvvvvvvvvvvvvvvvvvvvvvvvvvv39)%>%
dplyr :: summarize(n())
工作正常。给我这个(想要的)输出
来源:本地数据框架[3 x 2]
vvv_5vvv10vvv15vvv20vvvvvvvvvvvvvvvvv39 n()
1 setosa 50
2 versicolor 50
3 virginica 5
SessionInfo()
> sessionInfo()
R版本3.1.1(2014-07-10)
平台:x86_64-w64-mingw32 / x64(64位)
语言环境:
[1] LC_COLLATE = Denmark_Denmark.1252 LC_CTYPE = Denmark_Denmark.1252 LC_MONETARY = Denmark_Denmark.1252 LC_NUMERIC = C
[5] LC_TIME = Denmark_Denmark.1252
附加的基本包:
[1]统计图形grDevices utils数据集方法base
其他附加的包:
[1] dplyr_0.3.0.2
通过命名空间加载(而不是附加):
[1] assertthat_0.1 DBI_0.3.1 lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.1 Rcpp_0.11.3 tools_3.1.1
这似乎是 一个已知问题 ,修正为 dplyr 0.3.1
。从@romainfrancois在发帖中的回复:
发生在这里[...]
new_groups< - lazyeval :: auto_name(new_groups)
因为:
lazyeval :: auto_name
函数(x,max_width = 40)
{
名称(x)< - auto_names(x,max_width = max_width)
x
}
< environment:namespace:lazyeval>
更新
在 dplyr 0.4.0
group_by()
支持超过39个字符的变量,这要归功于 lazyeval
中的修正:
库(dplyr)
#具有40个字符的变量名
名称(iris)[5]< - vvv_5vvv10vvvvvvvvvvvvvvvvvvvvvvvvvvv
iris%>%
group_by(vvv_5vvv10vvvvvvvvvvvvvvvvvvvvvvvvvv40)%>%
总结(n())
# vvv_5vvv10vvv15vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvStrings longer than 39 characters fail in dplyr, returning the error: "Error: index out of bounds".
Am I missing something or is this a bug?
40 characters does not work:
library(dplyr)
names(iris)[5] <- "vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vvv40"
iris %>% dplyr::group_by( vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vvv40 ) %>%
dplyr::summarise( n() )
Gives me the error:
Error: index out of bounds
39 characters works:
names(iris)[5] <- "vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vv39"
iris %>% dplyr::group_by( vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vv39 ) %>%
dplyr::summarise( n() )
Works fine. gives me this (desired) output
Source: local data frame [3 x 2]
vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vv39 n()
1 setosa 50
2 versicolor 50
3 virginica 5
SessionInfo()
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
[5] LC_TIME=Danish_Denmark.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.3.0.2
loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1 lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.1 Rcpp_0.11.3 tools_3.1.1
解决方案 This seems to be a known issue, to be fixed in dplyr 0.3.1
. From the reply by @romainfrancois in the post:
"It happens here [...]
new_groups <- lazyeval::auto_name(new_groups)
because:
lazyeval::auto_name
function (x, max_width = 40)
{
names(x) <- auto_names(x, max_width = max_width)
x
}
<environment: namespace:lazyeval>
"
Update
In dplyr 0.4.0
"group_by()
supports variables with more than 39 characters thanks to a fix in lazyeval
":
library(dplyr)
# Variable name with 40 characters
names(iris)[5] <- "vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vvv40"
iris %>%
group_by(vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vvv40) %>%
summarise(n())
# vvv_5vvv10vvv15vvv20vvv25vvv30vvv35vvv40 n()
# 1 setosa 50
# 2 versicolor 50
# 3 virginica 50
这篇关于长变量名称在dplyr中失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!