在dplyr中有效地分配多个输出的函数进行变异或汇总 [英] Efficient assignment of a function with multiple outputs in dplyr mutate or summarise

查看:139
本文介绍了在dplyr中有效地分配多个输出的函数进行变异或汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到很多例子,它使用 dplyr :: mutate 与返回多个输出的函数组合创建多个列。例如:

  tmp<  -  mtcars%>%
group_by(cyl)%>%
summary(min = summary(mpg)[1],
median = summary(mpg)[3],
mean = summary(mpg)[4],
max = summary mpg)[6])

这样的语法意味着 code>函数调用4次,在这个例子中看起来不是特别有效。在总结 mutate 中的列名称列表输出有什么方法?



例如,从上一个问题:使用dplyr(或其他方式)将包含列表的数据帧列拆分为多个列,我知道您可以分配<$ c的输出$ c> summary 作为列表,然后使用 do(data.frame(...))拆分它,但这意味着你有然后再添加列名,语法不是很漂亮。

解决方案

这将解决你的例子,但也许不是你的主体题。在您显示的情况下,您可以将其重写为:

  tmp<  -  mtcars%>%
group_by (cyl)%>%
summarise_each(funs(min,median,mean,max),mpg)

这更有效率,花费大约40%的时间来运行:

  microbenchmark(mtcars%> %
group_by(cyl)%>%
summarise_each(funs(min,median,mean,max),mpg),
times = 1000L)


mtcars%>%group_by(cyl)%>%summarise_each(funs(min,median,mean,max),mpg)
min lq mean median uq max neval
2.002762 2.159464 2.330703 2.216719 2.271264 7.771477 1000


微基准(mtcars%>%
group_by(cyl)%>%
总结(最小=总结(mpg)[1]
median = summary(mpg)[3],
mean = summary(mpg)[4],
max = summary(mpg)[6]),times = 1000L)

mtcars%>%group_by(cyl)%>%summaryize(min = summary(mpg)[1],median = summary(mpg)[3] mpg)[4],max = summary(mpg)[6])
min lq平均值uq max neval
4.967731 5.21122 5.571605 5.360689 5.530197 13.26596 1000
/ pre>

但是,肯定会有其他情况是否会解决问题。



编辑: / p>

do()函数可以解决这个问题。例如

  by_cyl<  -  group_by(mtcars,cyl)%>%
do(mod = summary )[c(1,4,6),])


I've noticed a lot of examples here which uses dplyr::mutate in combination with a function returning multiple outputs to create multiple columns. For example:

tmp <- mtcars %>%
    group_by(cyl) %>%
    summarise(min = summary(mpg)[1],
              median = summary(mpg)[3],
              mean = summary(mpg)[4],
              max = summary(mpg)[6])

Such syntax however means that the summary function is called 4 times, in this example, which does not seem particularly efficient. What ways are there to efficiently assign a list output to a list of column names in summarise or mutate?

For example, from a previous question: Split a data frame column containing a list into multiple columns using dplyr (or otherwise), I know that you can assign the output of summary as a list and then split it using do(data.frame(...)), however this means that you have to then add the column names later and the syntax is not as pretty.

解决方案

This addresses your example, but perhaps not your principal question. In the case you showed, you could rewrite this as:

tmp <- mtcars %>%
    group_by(cyl) %>%
    summarise_each(funs(min, median, mean, max), mpg)

This is more efficient, taking about 40% as much time to run:

microbenchmark(mtcars %>%
                 group_by(cyl) %>%
                 summarise_each(funs(min, median, mean, max), mpg), 
                                times = 1000L)


 mtcars %>% group_by(cyl) %>% summarise_each(funs(min, median,mean, max), mpg)
      min       lq     mean   median       uq      max neval
 2.002762 2.159464 2.330703 2.216719 2.271264 7.771477  1000


microbenchmark(mtcars %>%
    group_by(cyl) %>%
    summarise(min = summary(mpg)[1],
              median = summary(mpg)[3],
              mean = summary(mpg)[4],
              max = summary(mpg)[6]), times = 1000L)

 mtcars %>% group_by(cyl) %>% summarise(min = summary(mpg)[1], median = summary(mpg)[3], mean = summary(mpg)[4], max = summary(mpg)[6])
      min      lq     mean   median       uq      max neval
 4.967731 5.21122 5.571605 5.360689 5.530197 13.26596  1000

However, there are certainly other cases whether this will not address the problem.

EDIT:

The do() function can solve this. e.g.

by_cyl <- group_by(mtcars, cyl) %>%
        do(mod = summary(.)[c(1,4,6),])

这篇关于在dplyr中有效地分配多个输出的函数进行变异或汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆