根据计数数据将R采样到特定大小的组中 [英] R sampling into groups of specific size based on count data

查看:32
本文介绍了根据计数数据将R采样到特定大小的组中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将df作为下一个,并想将大小/bin/group/sample分成大小为20的组.理想地,这种装箱"是指将其装箱.跨ID随机发生,而不是从上一行到下一行连续发生.

I want to take a df as the one below and want to cut/bin/group/sample into groups of size=20. Ideally, this "binning" occurs randomly across IDs rather then consecutively from top row to bottom row).

例如ID 2、29和71的计数为7、7、6,非常适合放入垃圾桶"中.大小= 20.我想实现垃圾箱的最小数量,而不关心ID的顺序(它们越随机,越好).

E.g. IDs 2, 29 and 71 have counts of 7,7,6 and would fit nicely into a "bin" of size=20. I want to achieve the minimum number of bins and do not care about order of IDs (the more random they are, the better).

set.seed(123)
df <- data.frame(
  ID = as.numeric(1:100),
  Count = as.numeric(sample(1:8, size = 100, replace = T)))

期望的结果将是如下所示的数据帧/小标题,具有最佳的随机采样并最小化bin数量.

Desired outcome would be a dataframe/tibble looking something like the below with optimum random sampling and minimising bin number.

Bin_size = 20是我设置的参数(理想结果是精确的20(= 20),但< 20可以,但是> 20不能).应该为每个Bin分配一个编号(例如,如果我有10个bin,我希望将它们称为Bin_number 1-10).

Bin_size=20 is the parameter set by me (the ideal outcome is exact 20 (=20) but <20 is ok, however, >20 is not ok). Each Bin should be given a number (e.g. if I have 10 bins, I would like them to be called Bin_number 1-10).

ID,计数,Bin_size,Bin_number

ID, Count, Bin_size, Bin_number

ID 2、7、20、1

ID 2, 7, 20, 1

ID 29、7、20、1

ID 29, 7, 20, 1

ID 71、6、20、1

ID 71, 6, 20, 1

其中7 + 7 + 6 = 20(等)

Where 7+7+6 = 20 (etc.)

在此方面的任何帮助将不胜感激.我一直想知道cumsum和group_by,但无法弄清楚.

Any help with this would be much appreciated. I have been wondering about cumsum and group_by but could not figure it out.

如果您需要更多详细信息,我们很乐意提供.谢谢!

if you need more details, I'm happy to provide them. thanks!

推荐答案

BBmisc 软件包具有一个简单的(虽然未优化)bin打包算法,该算法可能有用:

The BBmisc package has a simple (though not optimized) bin packing algorithm that might be useful:

library(BBmisc)
library(dplyr)

df %>%
  as_tibble() %>%
  mutate(bin = binPack(Count, 20),
         bin_size = ave(Count, bin, FUN = sum)) %>%
  arrange(bin)

# A tibble: 100 x 4
      ID Count   bin bin_size
   <dbl> <dbl> <int>    <dbl>
 1    11     4     1       20
 2    17     8     1       20
 3    27     8     1       20
 4    22     4     2       20
 5    42     8     2       20
 6    56     8     2       20
 7    34     4     3       20
 8    62     8     3       20
 9    79     8     3       20
10    40     4     4       20
# ... with 90 more rows

这篇关于根据计数数据将R采样到特定大小的组中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆