在从dplyr 0.3.02中使用group_by对数据框进行分组后选择一列错误 [英] Error selecting a column after grouping the dataframe using group_by from dplyr 0.3.02
问题描述
我不能在对数据进行分组后选择第二列。
I cannot select the second column after grouping the data.frame
d <- data.frame(x = 1:10, y = runif(1))
d[,2] # selects the second column
d <- group_by(d, x)
d[,2] # produces the error: index out of bounds
推荐答案
>这是针对 grouping_df
对象在dplyr中的意图行为 - 逻辑是分组变量在数据仍然分组时不能被删除。考虑这个例子,我使用dplyr的 select
函数从 grouping_df
中提取变量:
I think this is intended behavior in dplyr for a grouped_df
object - the logic being that the grouping variable(s) cannot be dropped while the data is still grouped. Consider this example where I use dplyr's select
function to extract variables from a grouped_df
:
require(dplyr)
d <- data.frame(x = 1:10, y = runif(1), z = rnorm(2))
d <- group_by(d, x)
select(d, y)
#Source: local data frame [10 x 2]
#Groups: x
#
# x y
#1 1 0.5861766
#2 2 0.5861766
#3 3 0.5861766
#4 4 0.5861766
#5 5 0.5861766
#6 6 0.5861766
#7 7 0.5861766
#8 8 0.5861766
#9 9 0.5861766
#10 10 0.5861766
您可以看到结果包括分组变量,即使没有在中指定选择
call。
You can see that the result includes the grouping variable even though it was not specified in the select
call.
select(d, z) # would work the same way
即使您明确排除了分组变量x,使用 select $ c时仍会返回$ c>:
Even if you explicitly excluded the grouping variable "x", it would still be returned when using select
:
select(d, -x)
#Source: local data frame [10 x 3]
#Groups: x
#
# x y z
#1 1 0.2110696 2.4393919
#2 2 0.2110696 0.8400083
#3 3 0.2110696 2.4393919
#4 4 0.2110696 0.8400083
#5 5 0.2110696 2.4393919
#6 6 0.2110696 0.8400083
#7 7 0.2110696 2.4393919
#8 8 0.2110696 0.8400083
#9 9 0.2110696 2.4393919
#10 10 0.2110696 0.8400083
要获取y列,您需要首先取消分组数据:
To get only the "y" column, you would need to ungroup the data first:
ungroup(d) %>% select(y)
#Source: local data frame [10 x 1]
#
# y
#1 0.5861766
#2 0.5861766
#3 0.5861766
#4 0.5861766
#5 0.5861766
#6 0.5861766
#7 0.5861766
#8 0.5861766
#9 0.5861766
#10 0.5861766
请注意,您可以使用 [
,其中包含分组变量,例如:
Note that you could use any subsetting with [
that includes the grouping variable(s), for example:
d[, 1:2]
或
Or
d[, c(1,3)]
这篇关于在从dplyr 0.3.02中使用group_by对数据框进行分组后选择一列错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!