将名称分配给dplyr的列表输出操作 [英] Assigning names to the list output of dplyr do operation

查看:117
本文介绍了将名称分配给dplyr的列表输出操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

dplyr 中的 do 函数通常会生成列表。根据 do 的输入,有没有办法为该列表分配名称?具体来说,我通过 group_by 结果,并希望列表的名称会给出列表元素对应的组的一些指示。

The do function in the package dplyr usually produces the list. Is there are way to assign names to that list depending on the input to do? Specifically I pass the group_by result and would like that the names of the list would give some indication to what group the list elements correspond.

这是我想要实现的玩具示例:

Here is the toy example of what I want to achieve:

> it = data.frame(ind=c("a","a","b","b","c"),var1=c(1,2,3,4,5), var1=c(2,3,4,2,2))
> group_by(it,ind)%.%summarise(min(var1))
Source: local data frame [3 x 2]

  ind min(var1)
1   c         5
2   b         3
3   a         1

现在用 do

> do(group_by(it,ind),function(x)min(x[,"var1"]))
[[1]]
[1] 5

[[2]]
[1] 3

[[3]]
[1] 1

理想情况下,名称应为 c(c,b,a)

Ideally the names should be c("c","b","a").

这可能吗?为什么 dplyr 可以反转组的排序?注意,在我的情况下, do 操作的结果是 lm 对象。

Is this possible? And why dplyr reverses sorting of the groups? Note in my case the result of the do operation is a lm object.

编辑:评论要求现实的例子,这里是我想到的。我根据数据拟合模型(虚拟代码):

The comment asks for realistic example, here is what I had in mind. I fit models depending on the data (dummy code):

res <- do(group_by(data,Index),lm,formula=y~x)

现在我想做各种各样的事情,如

Now I want to do various things like

sapply(res,coef)

所以我想将结果与原始数据集相关联,在这种情况下,系数对应于 Index

So I want to relate the results to the original dataset, in this case to what Index the coefficients correspond.

编辑2:所需的行为可以通过 dlply function: p>

Edit 2: The desired behaviour can be achieved with dlply function:

dlply(it,~ind,function(d)min(d[,"var1"]))

$a
[1] 1

$b
[1] 3

$c
[1] 5

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  ind
1   a
2   b
3   c

我正在查看是否可以复制此行为最好用最少的干预。

I am looking whether it is possible to replicate this behaviour with dplyr, preferably with minimal intervention.

推荐答案

尝试这个标记版本的 do.grouped_df

do2 <- function (.data, .f, ...) {
    if (is.null(attr(.data, "indices"))) {
        .data <- dplyr:::grouped_df_impl(.data, attr(.data, "vars"), 
            attr(.data, "drop"))
    }
    index <- attr(.data, "indices")
    out <- vector("list", length(index))
    for (i in seq_along(index)) {
        subs <- .data[index[[i]] + 1L, , drop = FALSE]
        out[[i]] <- .f(subs, ...)
    }
    nms <- as.character(attr(.data, "labels")[[1]])
    setNames(out, nms)
}

library(gusbfn)

it %.% group_by(ind) %.% do2(function(x) min(x$var1))

其中:

$a
[1] 1

$b
[1] 3

$c
[1] 5



<像这样缩短一点:

It could also be combined with fn$ from the gsubfn package like this to shorten it slightly:

library(dplyr)
library(gsubfn)

it %.% group_by(ind) %.% fn$do2(~ min(x$var1))

给出相同的答案。

这篇关于将名称分配给dplyr的列表输出操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆