在从dplyr 0.3.02中使用group_by对数据框进行分组后选择一列错误 [英] Error selecting a column after grouping the dataframe using group_by from dplyr 0.3.02

查看:195
本文介绍了在从dplyr 0.3.02中使用group_by对数据框进行分组后选择一列错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不能在对数据进行分组后选择第二列。

I cannot select the second column after grouping the data.frame

d <- data.frame(x = 1:10, y = runif(1))
d[,2] # selects the second column
d <- group_by(d, x)
d[,2] # produces the error: index out of bounds


推荐答案

>这是针对 grouping_df 对象在dplyr中的意图行为 - 逻辑是分组变量在数据仍然分组时不能被删除。考虑这个例子,我使用dplyr的 select 函数从 grouping_df 中提取变量:

I think this is intended behavior in dplyr for a grouped_df object - the logic being that the grouping variable(s) cannot be dropped while the data is still grouped. Consider this example where I use dplyr's select function to extract variables from a grouped_df:

require(dplyr)
d <- data.frame(x = 1:10, y = runif(1), z  = rnorm(2))
d <- group_by(d, x)

select(d, y)  
#Source: local data frame [10 x 2]
#Groups: x
#
#    x         y
#1   1 0.5861766
#2   2 0.5861766
#3   3 0.5861766
#4   4 0.5861766
#5   5 0.5861766
#6   6 0.5861766
#7   7 0.5861766
#8   8 0.5861766
#9   9 0.5861766
#10 10 0.5861766

您可以看到结果包括分组变量,即使没有在中指定选择 call。

You can see that the result includes the grouping variable even though it was not specified in the select call.

select(d, z) # would work the same way

即使您明确排除了分组变量x,使用 select

Even if you explicitly excluded the grouping variable "x", it would still be returned when using select:

select(d, -x)
#Source: local data frame [10 x 3]
#Groups: x
#
#    x         y         z
#1   1 0.2110696 2.4393919
#2   2 0.2110696 0.8400083
#3   3 0.2110696 2.4393919
#4   4 0.2110696 0.8400083
#5   5 0.2110696 2.4393919
#6   6 0.2110696 0.8400083
#7   7 0.2110696 2.4393919
#8   8 0.2110696 0.8400083
#9   9 0.2110696 2.4393919
#10 10 0.2110696 0.8400083

要获取y列,您需要首先取消分组数据:

To get only the "y" column, you would need to ungroup the data first:

ungroup(d) %>% select(y)
#Source: local data frame [10 x 1]
#
#           y
#1  0.5861766
#2  0.5861766
#3  0.5861766
#4  0.5861766
#5  0.5861766
#6  0.5861766
#7  0.5861766
#8  0.5861766
#9  0.5861766
#10 0.5861766

请注意,您可以使用 [,其中包含分组变量,例如:

Note that you could use any subsetting with [ that includes the grouping variable(s), for example:

d[, 1:2]



Or

d[, c(1,3)]

这篇关于在从dplyr 0.3.02中使用group_by对数据框进行分组后选择一列错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆