在dplyr中有效地分配多个输出的函数进行变异或汇总 [英] Efficient assignment of a function with multiple outputs in dplyr mutate or summarise
问题描述
我注意到很多例子,它使用 dplyr :: mutate
与返回多个输出的函数组合创建多个列。例如:
tmp< - mtcars%>%
group_by(cyl)%>%
summary(min = summary(mpg)[1],
median = summary(mpg)[3],
mean = summary(mpg)[4],
max = summary mpg)[6])
这样的语法意味着 code>函数调用4次,在这个例子中看起来不是特别有效。在
总结
或 mutate
中的列名称列表输出有什么方法?
例如,从上一个问题:使用dplyr(或其他方式)将包含列表的数据帧列拆分为多个列,我知道您可以分配<$ c的输出$ c> summary 作为列表,然后使用 do(data.frame(...))
拆分它,但这意味着你有然后再添加列名,语法不是很漂亮。
这将解决你的例子,但也许不是你的主体题。在您显示的情况下,您可以将其重写为:
tmp< - mtcars%>%
group_by (cyl)%>%
summarise_each(funs(min,median,mean,max),mpg)
这更有效率,花费大约40%的时间来运行:
microbenchmark(mtcars%> %
/ pre>
group_by(cyl)%>%
summarise_each(funs(min,median,mean,max),mpg),
times = 1000L)
mtcars%>%group_by(cyl)%>%summarise_each(funs(min,median,mean,max),mpg)
min lq mean median uq max neval
2.002762 2.159464 2.330703 2.216719 2.271264 7.771477 1000
微基准(mtcars%>%
group_by(cyl)%>%
总结(最小=总结(mpg)[1]
median = summary(mpg)[3],
mean = summary(mpg)[4],
max = summary(mpg)[6]),times = 1000L)
mtcars%>%group_by(cyl)%>%summaryize(min = summary(mpg)[1],median = summary(mpg)[3] mpg)[4],max = summary(mpg)[6])
min lq平均值uq max neval
4.967731 5.21122 5.571605 5.360689 5.530197 13.26596 1000
但是,肯定会有其他情况是否会解决问题。
编辑: / p>
do()
函数可以解决这个问题。例如by_cyl< - group_by(mtcars,cyl)%>%
do(mod = summary )[c(1,4,6),])
I've noticed a lot of examples here which uses
dplyr::mutate
in combination with a function returning multiple outputs to create multiple columns. For example:tmp <- mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6])
Such syntax however means that the
summary
function is called 4 times, in this example, which does not seem particularly efficient. What ways are there to efficiently assign a list output to a list of column names insummarise
ormutate
?For example, from a previous question: Split a data frame column containing a list into multiple columns using dplyr (or otherwise), I know that you can assign the output of
summary
as a list and then split it usingdo(data.frame(...))
, however this means that you have to then add the column names later and the syntax is not as pretty.解决方案This addresses your example, but perhaps not your principal question. In the case you showed, you could rewrite this as:
tmp <- mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median, mean, max), mpg)
This is more efficient, taking about 40% as much time to run:
microbenchmark(mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median, mean, max), mpg), times = 1000L) mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median,mean, max), mpg) min lq mean median uq max neval 2.002762 2.159464 2.330703 2.216719 2.271264 7.771477 1000 microbenchmark(mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6]), times = 1000L) mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6]) min lq mean median uq max neval 4.967731 5.21122 5.571605 5.360689 5.530197 13.26596 1000
However, there are certainly other cases whether this will not address the problem.
EDIT:
The
do()
function can solve this. e.g.by_cyl <- group_by(mtcars, cyl) %>% do(mod = summary(.)[c(1,4,6),])
这篇关于在dplyr中有效地分配多个输出的函数进行变异或汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!