使用dplyr的do来执行bootstrap复制 [英] Using dplyr's do to perform bootstrap replications

查看:104
本文介绍了使用dplyr的do来执行bootstrap复制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣使用dplyr来构建引导复制(重复分析,每次首次对数据进行替换时)。 Hadley Wickham here 提供了一些用于以有效的方式重复引导分析的代码:

I'm interested in using dplyr to construct bootstrap replications (repeated analyses where the data is first sampled with replacement each time). Hadley Wickham here provides some code for repeating bootstrapped analyses in an efficient way:

bootstrap <- function(df, m) {
  n <- nrow(df)

  attr(df, "indices") <- replicate(m, sample(n, replace = TRUE), 
    simplify = FALSE)
  attr(df, "drop") <- TRUE
  attr(df, "group_sizes") <- rep(n, m)
  attr(df, "biggest_group_size") <- n
  attr(df, "labels") <- data.frame(replicate = 1:m)
  attr(df, "vars") <- list(quote(boot)) # list(substitute(bootstrap(m)))
  class(df) <- c("grouped_df", "tbl_df", "tbl", "data.frame")

  df
}

library(dplyr)
mboot <- bootstrap(mtcars, 10)

# Works
mboot %.% summarise(mean(cyl))

虽然此功能适用于总结,但它不适用于 do cont ains一个数据框架。 (想象一下,data.frame包含一些有用的东西,例如我们希望引导的分析结果)。

While this function works well for summarise, it doesn't work for do when do contains a data.frame. (Imagine for now that the data.frame contains something useful such as the results of the analysis we wish to bootstrap).

bootstrap(mtcars, 3) %>% do(data.frame(x=1:2))
# Error: index out of bounds

与追溯

11: stop(list(message = "index out of bounds", call = NULL, cppstack = NULL))
10: .Call("dplyr_grouped_df_impl", PACKAGE = "dplyr", data, symbols, 
        drop)
9: grouped_df_impl(data, unname(vars), drop)
8: grouped_df(cbind_list(labels, out), groups)
7: label_output_dataframe(labels, out, groups(.data))
6: do.grouped_df(`bootstrap(mtcars, 3)`, data.frame(x = 1:2))
5: do(`bootstrap(mtcars, 3)`, data.frame(x = 1:2))
4: eval(expr, envir, enclos)
3: eval(e, env)
2: withVisible(eval(e, env))
1: bootstrap(mtcars, 3) %>% do(data.frame(x = 1:2))

我能够通过执行两个工作来解决这个问题 do 步骤和组:

I was able to work around this by performing two do steps and a group by:

bootstrap(mtcars, 10) %>% do(d=data.frame(x=1:2)) %>% group_by(replicate) %>% do(.$d[[1]])

但这似乎需要很多额外的,有些笨拙的步骤(也得到一个警告,分组横向数据框条横向自然)。我也知道我可以先把这些数据复制到十个复本中,其中包含一些类似

but this seems to require a lot of extra, and somewhat clumsy, steps (and also gets a warning, Grouping rowwise data frame strips rowwise nature). I'm also aware that I could replicate the data into ten replications first with something like

data.frame(boot=1:10) %>% group_by(boot) %>% do(sample_n(mtcars, nrow(mtcars), replace=TRUE))

但是如果数据或引导次数的重复数量很大,这在内存中是非常低效的。

but if the data or the number of bootstrap replicates is large this is extremely inefficient in memory.

有没有办法,也许是更改 bootstrap 安装功能,我可以使用 bootstrap(mtcars,3)%>%do(data.frame(x = 1:2))

Is there a way, perhaps by altering the bootstrap setup function, that I can perform these replicates with bootstrap(mtcars, 3) %>% do(data.frame(x = 1:2))?

推荐答案

我认为这是 bootstrap 函数。 vars 属性应与标签中的 data.frame 中的列名称匹配。 code>属性。但在该函数中, vars 属性称为boot,列名称为复制。所以,如果你做这个微小的改变:

I think it is a small bug in the bootstrap function. The vars attribute should match the column name in the data.frame in the labels attribute. But in the function, the vars attribute is called "boot", and the column name is replicate. So, if you make this minor change:

bootstrap <- function(df, m) {
  n <- nrow(df)

  attr(df, "indices") <- replicate(m, sample(n, replace = TRUE), 
                                   simplify = FALSE)
  attr(df, "drop") <- TRUE
  attr(df, "group_sizes") <- rep(n, m)
  attr(df, "biggest_group_size") <- n
  attr(df, "labels") <- data.frame(replicate = 1:m)
  attr(df, "vars") <- list(quote(replicate)) # Change
#  attr(df, "vars") <- list(quote(boot)) # list(substitute(bootstrap(m)))
  class(df) <- c("grouped_df", "tbl_df", "tbl", "data.frame")

  df
}

然后它按预期工作:

bootstrap(mtcars, 3) %>% do(data.frame(x=1:2))
# Source: local data frame [6 x 2]
# Groups: replicate

#   replicate x
# 1         1 1
# 2         1 2
# 3         2 1
# 4         2 2
# 5         3 1
# 6         3 2

这篇关于使用dplyr的do来执行bootstrap复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆