如何复制使用dplyr自定义函数的ddply行为? [英] How to replicate a ddply behavior that uses a custom function with dplyr?
问题描述
我试图用 dplyr
替换我所有的 plyr
电话。还有一些阻碍,其中一个是使用 group_by
函数。我想象它的行为方式与第二个 ddply
参数相同,并根据我列出的分组变量进行分割,应用和组合。但这似乎并非如此。这是一个非常简单的例子。
我们定义一个愚蠢的功能
mm< - function(x)return(x [1:5,])
现在我们可以在 iris
数据集中分割物种,如下所示,并将此函数应用于每个片段。
code> ddply(iris,。(Species),mm)
但是,当我尝试与 dplyr
相同时,它不能按预期工作。
iris%>%group_by(Species)%>%mm
/ pre>
我做错了什么?
解决方案如图所示在
?do
中,您可以在表达式中引用。
的组。以下将复制您的ddply
输出:iris%>% group_by(Species)%>%do(。[1:5,])
#来源:本地数据框[15 x 5]
#组:种类
#
#Sepal.Length Sepal.Width Petal.Length花瓣种类
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 7.0 3.2 4.7 1.4 versicolor
#7 6.4 3.2 4.5 1.5 versicolor
# 8 6.9 3.1 4.9 1.5 versicolor
#9 5.5 2.3 4.0 1.3 versicolor
#10 6.5 2.8 4.6 1.5 versicolor
#11 6.3 3.3 6.0 2.5 virginica
#12 5.8 2.7 5.1 1.9 virginica
#13 7.1 3.0 5.9 2.1 virginica
#14 6.3 2.9 5.6 1.8 virginica
#15 6.5 3.0 5.8 2.2 virginica
更一般来说,将自定义函数应用于具有
dplyr
,您可以执行以下操作(感谢@docendodiscimus):
iris%>%group_by物种)%>%do(mm(。))
I'm trying to replace all my
plyr
calls withdplyr
. There are still a few snags and one of them is with thegroup_by
function. I imagine it acts the same way as the secondddply
argument and does a split, apply and combine based on the grouping variables I list. But that doesn't appear to be the case. Here is a rather trivial example.Let's define a silly function
mm <- function(x) return(x[1:5, ])
Now we can split the species in the
iris
dataset like so and apply this function to each piece.ddply(iris, .(Species), mm)
This works as intended. However, when I try the same with
dplyr
, it doesn't work as expected.iris %>% group_by(Species) %>% mm
What am I doing wrong?
解决方案As shown in
?do
, you can refer to a group with.
in your expression. The following will replicate yourddply
output:iris %>% group_by(Species) %>% do(.[1:5, ]) # Source: local data frame [15 x 5] # Groups: Species # # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 7.0 3.2 4.7 1.4 versicolor # 7 6.4 3.2 4.5 1.5 versicolor # 8 6.9 3.1 4.9 1.5 versicolor # 9 5.5 2.3 4.0 1.3 versicolor # 10 6.5 2.8 4.6 1.5 versicolor # 11 6.3 3.3 6.0 2.5 virginica # 12 5.8 2.7 5.1 1.9 virginica # 13 7.1 3.0 5.9 2.1 virginica # 14 6.3 2.9 5.6 1.8 virginica # 15 6.5 3.0 5.8 2.2 virginica
More generally, to apply a custom function to groups with
dplyr
, you can do something like the following (thanks @docendodiscimus):iris %>% group_by(Species) %>% do(mm(.))
这篇关于如何复制使用dplyr自定义函数的ddply行为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!