在group_by opration之后,如何为每个组生成数据框? [英] How can dplyr generate data frame for each group after the group_by opration?
问题描述
input.data.frame%>%
group_by(gid)%>%
{某些操作为每个组生成一个新的数据框} ## FAILED !!!!
在dplyr中,函数 mutate
添加新的列到每个组,总结
为每个组生成摘要,都不能满足我的要求。 (我错过了什么吗?)
或者,使用plyr包的 ddply
以前的dplyr的交互,我可以通过
ddply(input.data.frame,。(gid),function x){
某些操作为每个组生成一个新的数据框
}
但是,当我加载plyr包时,dplyr中的一些功能将被屏蔽掉。
任何想法?
谢谢。
Xiaming
我的评论是一个答案..
是的,dplyr提供了一种创建每个组的数据框架的方法。使用 do
操作符在一个分组的数据框架/ tbl将让你这样做,更准确地说,它允许你应用任意函数到每个组。这是在$ code> do $ c的帮助文件$ c>:
[...]您可以使用do来执行仲裁ry计算,返回
数据帧或将被存储在列表中的任意对象。这个
在使用模型时特别有用:您可以使用模型
,然后用
另外执行或总结来灵活地提取组件。
到目前为止,我的经验是,只要可以使用mutate / summarize / mutate_each /等专门的dplyr函数,它们应该优于 do
,因为它们通常比使用 do
更有效,但当然不是那么灵活。
I was very shocked by the smoothness of dplyr package in flow-style data processing. Recently I rush into a problem to generate a new data frame for each group ID and combine those small data frames into a final larger data frame. A toy example:
input.data.frame %>%
group_by(gid) %>%
{some operation to generate a new data frame for each group} ## FAILED!!!!
In dplyr, the function mutate
adding new column to each group and summarise
generating summaries for each group, both can not fulfill my requirement. (Did I miss something?)
Alternatively, using ddply
of plyr package, the previous interation of dplyr, I can make it via
ddply(input.data.frame, .(gid), function(x) {
some operation to generate a new data frame for each group
}
But the shortage is some funcs in dplyr will be masked from availableness when I load the plyr package.
Any idea there?
Thanks.
Xiaming
Turning my comment into an answer..
Yes, dplyr offers a way to create data.frames for each group. Using the do
operator on a grouped data.frame / tbl will let you do this, more precisely, it lets you apply arbitrary functions to each group. This is documented in the help file for do
:
[...] You can use do to perform arbitrary computation, returning either a data frame or arbitrary objects which will be stored in a list. This is particularly useful when working with models: you can fit models per group with do and then flexibly extract components with either another do or summarise.
My experience so far is that whenever it is possible to use one of the specialised dplyr functions like mutate / summarise / mutate_each / etc., they should be preferred over do
, because they are often more efficient than the use of do
, but of course not as flexible.
这篇关于在group_by opration之后,如何为每个组生成数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!