dplyr：样本数量大于总体数量 [英] dplyr: Sample size greater than population size

查看：104 发布时间：2020/10/26 4:53:12 r dplyr

本文介绍了dplyr：样本数量大于总体数量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框：

> class(dataset)
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
> dim(dataset)
[1] 64480    39

我要从中采样50.000个样本

where I want to sample 50.000 samples from

> dataset %>% dplyr::sample_n(50000)

但总是给我错误

错误：样本大小（50000）大于总体大小（1）。您要替换= TRUE吗？

Error: Sample size (50000) greater than population size (1). Do you want to replace = TRUE?

但是例如有效的方法：

> dim(dataset[1] %>% dplyr::sample_n(50000))
[1] 50000     1

那为什么我的人口规模（1）-与分组有关吗？

So why is my population size (1) - does that have something to do with grouping?

推荐答案

是的，可能与分组有关。从 class（dataset）的输出中可以看到，您的数据当前已分组（注意 grouped_df 信息），并且显然，一个或多个组的观测值太少，无法对50000个观测值进行采样而不进行替换。

Yes, it probably has to do with grouping. As you can see from the output of class(dataset) your data is currently grouped (note the grouped_df info) and one or more groups apparently have too few observations to sample 50000 observations without replacement.

要解决此问题，您可以在采样前取消数据分组：

To resolve this, you can either ungroup your data before sampling:

dataset %>% ungroup() %>% sample_n(50000)

或者您可以带有替换的样本：

Or you can sample with replacement:

dataset %>% sample_n(50000, replace = TRUE)

这篇关于dplyr：样本数量大于总体数量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

dplyr：样本数量大于总体数量 [英] dplyr: Sample size greater than population size

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

dplyr：样本数量大于总体数量 [英] dplyr: Sample size greater than population size

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭