在group_by操作之后,dplyr如何为每个组生成数据帧? [英] How can dplyr generate data frame for each group after the group_by operation?
问题描述
我很震惊dplyr包在流式数据处理中的平滑性。最近,我急于为每个组ID生成一个新的数据帧并将这些小数据帧组合为最终的较大数据帧的问题。一个玩具示例:
I was very shocked by the smoothness of dplyr package in flow-style data processing. Recently I rush into a problem to generate a new data frame for each group ID and combine those small data frames into a final larger data frame. A toy example:
input.data.frame %>%
group_by(gid) %>%
{some operation to generate a new data frame for each group} ## FAILED!!!!
在dplyr中,函数 mutate
添加新每个组的列和 summary
为每个组生成摘要,两者都无法满足我的要求。 (我想念什么吗?)
In dplyr, the function mutate
adding new column to each group and summarise
generating summaries for each group, both can not fulfill my requirement. (Did I miss something?)
或者,使用plyr软件包的 ddply
, dplyr的先前交互,我可以通过
Alternatively, using ddply
of plyr package, the previous interation of dplyr, I can make it via
ddply(input.data.frame, .(gid), function(x) {
some operation to generate a new data frame for each group
}
但是,当我加载plyr软件包时,dplyr中的某些功能将被可用性所掩盖。
But the shortage is some funcs in dplyr will be masked from availableness when I load the plyr package.
推荐答案
我的评论
是的,dplyr提供了一种为每个组创建data.frames的方法。使用 do
在分组data.frame / tbl上的>运算符将使您能够执行此操作,更精确地说,它使您可以将任意函数应用于每个组,这在帮助文件中记录为 do
:
Yes, dplyr offers a way to create data.frames for each group. Using the do
operator on a grouped data.frame / tbl will let you do this, more precisely, it lets you apply arbitrary functions to each group. This is documented in the help file for do
:
[...]您可以使用do执行任意计算,返回
数据帧或任意对象将被存储在列表中。
在处理模型时特别有用:您可以将每个
模型与do配合,然后灵活地提取
个其他do或进行汇总的组件。
[...] You can use do to perform arbitrary computation, returning either a data frame or arbitrary objects which will be stored in a list. This is particularly useful when working with models: you can fit models per group with do and then flexibly extract components with either another do or summarise.
到目前为止,我的经验是,只要有可能使用诸如mutate / summary / mutate_each /之类的专用dplyr函数之一,它们就应该比<$ c更可取$ c> do ,因为它们通常比使用 do
更有效,但当然不那么灵活。
My experience so far is that whenever it is possible to use one of the specialised dplyr functions like mutate / summarise / mutate_each / etc., they should be preferred over do
, because they are often more efficient than the use of do
, but of course not as flexible.
这篇关于在group_by操作之后,dplyr如何为每个组生成数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!