将名称分配给dplyr的列表输出操作 [英] Assigning names to the list output of dplyr do operation
问题描述
包 dplyr 中的 do
函数通常会生成列表。根据 do
的输入,有没有办法为该列表分配名称?具体来说,我通过 group_by
结果,并希望列表的名称会给出列表元素对应的组的一些指示。
The do
function in the package dplyr usually produces the list. Is there are way to assign names to that list depending on the input to do
? Specifically I pass the group_by
result and would like that the names of the list would give some indication to what group the list elements correspond.
这是我想要实现的玩具示例:
Here is the toy example of what I want to achieve:
> it = data.frame(ind=c("a","a","b","b","c"),var1=c(1,2,3,4,5), var1=c(2,3,4,2,2))
> group_by(it,ind)%.%summarise(min(var1))
Source: local data frame [3 x 2]
ind min(var1)
1 c 5
2 b 3
3 a 1
现在用 do
> do(group_by(it,ind),function(x)min(x[,"var1"]))
[[1]]
[1] 5
[[2]]
[1] 3
[[3]]
[1] 1
理想情况下,名称应为 c(c,b,a)
。
Ideally the names should be c("c","b","a")
.
这可能吗?为什么 dplyr 可以反转组的排序?注意,在我的情况下, do
操作的结果是 lm
对象。
Is this possible? And why dplyr reverses sorting of the groups? Note in my case the result of the do
operation is a lm
object.
编辑:评论要求现实的例子,这里是我想到的。我根据数据拟合模型(虚拟代码):
The comment asks for realistic example, here is what I had in mind. I fit models depending on the data (dummy code):
res <- do(group_by(data,Index),lm,formula=y~x)
现在我想做各种各样的事情,如
Now I want to do various things like
sapply(res,coef)
所以我想将结果与原始数据集相关联,在这种情况下,系数对应于 Index
。
So I want to relate the results to the original dataset, in this case to what Index
the coefficients correspond.
编辑2:所需的行为可以通过 dlply
function: p>
Edit 2: The desired behaviour can be achieved with dlply
function:
dlply(it,~ind,function(d)min(d[,"var1"]))
$a
[1] 1
$b
[1] 3
$c
[1] 5
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
ind
1 a
2 b
3 c
我正在查看是否可以复制此行为最好用最少的干预。
I am looking whether it is possible to replicate this behaviour with dplyr, preferably with minimal intervention.
推荐答案
尝试这个标记版本的 do.grouped_df
do2 <- function (.data, .f, ...) {
if (is.null(attr(.data, "indices"))) {
.data <- dplyr:::grouped_df_impl(.data, attr(.data, "vars"),
attr(.data, "drop"))
}
index <- attr(.data, "indices")
out <- vector("list", length(index))
for (i in seq_along(index)) {
subs <- .data[index[[i]] + 1L, , drop = FALSE]
out[[i]] <- .f(subs, ...)
}
nms <- as.character(attr(.data, "labels")[[1]])
setNames(out, nms)
}
library(gusbfn)
it %.% group_by(ind) %.% do2(function(x) min(x$var1))
其中:
$a
[1] 1
$b
[1] 3
$c
[1] 5
<像这样缩短一点:
It could also be combined with fn$
from the gsubfn package like this to shorten it slightly:
library(dplyr)
library(gsubfn)
it %.% group_by(ind) %.% fn$do2(~ min(x$var1))
给出相同的答案。
这篇关于将名称分配给dplyr的列表输出操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!