索引grouping_df对象 [英] Indexing grouped_df object

查看:125
本文介绍了索引grouping_df对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试通过索引选择类 grouping_df 的对象的列给出Error:index out of bounds。例如

Trying to select a column of an object of class grouped_df by index gives "Error: index out of bounds". For example

x <- mtcars %>% group_by(am, gear) %>% summarise_each(funs(sum), disp, hp, drat)
class(x)
#    "grouped_df" "tbl_df"     "tbl"        "data.frame"
# For some reason the first column can be selected...
x[1]
#    Source: local data frame [4 x 1]
#    Groups: am
#    am
#     0
#     0
#     1
#     1    
# ...but any index > 1 fails
x[2] 
#   Error: index out of bounds
# Coercing to data frame does the trick...
as.data.frame(x)[2]
#   gear
#      3
#      4
#      4
#      5
#... and so does ungrouping
all(ungroup(x)[2] == as.data.frame(x)[2]) # TRUE

这是使用R版本3.1.1和dplyr 0.3.0.2。我不知道这是一个错误还是有意的..有没有什么好的理由为什么它这样工作?每次使用 dplyr 之后,我宁愿不要记得取消分组我的数据框。

This is using R version 3.1.1 and dplyr 0.3.0.2. I'm not sure whether this is a bug or intentional.. Is there any good reason why it works this way? I'd rather not have to remember to ungroup my data frames after using dplyr every time...

<更新进一步看,我的猜测是,定义 [。grouping_df 这样是为了在调用eg时保留组 x [1:3] (可以工作)。但是,当索引不是分组变量的一部分时,会抛出上面的错误。也许可以修改它,以便在这种情况下,它调用 [。tbl_df 并在同一时间引发警告...

Update Having looked a bit further into this, my guess is that the motivation for defining [.grouped_df this way is for the groups to be preserved when calling e.g. x[1:3] (which works). However, when the index is not part of the grouping variables, the error above is thrown. Perhaps it could be modified so that in this case it calls [.tbl_df and throws a warning at the same time...

更新2 [。grouping_df 已在开发版本的dplyr(0.3.0.9000)中进行了修改。它仍然抛出一个错误,但现在更清楚,指定哪些分组变量不包括在内。

Update 2 [.grouped_df has been modified in the development version of dplyr (0.3.0.9000). It still throws an error, but is more clear now, specifying which grouping variables were not included.

x[2]
# Error in `[.grouped_df`(x, 2) : 
#     cannot group, grouping variables 'am' not included

我找到的最好的解决方案,在这种情况下代码不会崩溃,是在 dplyr 命令的末尾包含%>%ungroup

The best solution I've found so that my code doesn't crash in this situation, is to include %>% ungroup at the end of the dplyr command chain.

推荐答案

对于 group_by ,函数除了分组的变量之外,[不能对df的列进行子集。查看有关 issuse 的详细信息,

For group_by, function[ can not subset the column of the df except the grouped variables. See details on issuse,

这篇关于索引grouping_df对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆