如果组合至少5个ID,则根据变量ID组合观察值 [英] Combine observations based on the variable ID if at least 5 IDs are combined
问题描述
上周我发布了以下问题。这个想法是通过随机组合基于变量id的观察结果来循环确定数据库的内容。
Last week I posted the following question . The idea was to make a loop that determined the content of a database by randomly combining observations based on the variable "id".
例如:
- 数据集1:id 1,2的组合, 3,4,5,6,7,8 ...
- 数据集2:id 1,2,3的组合
- 数据集3: id 2,3,4,5的组合
- 数据集4:第5,6,7,8,9,10,...的组合
- dataset 1: combinations of id 1, 2, 3, 4, 5, 6, 7, 8...
- dataset 2: combinations of id 1, 2, 3
- dataset 3: combinations of id 2, 3, 4, 5
- dataset 4: combinations of id 5, 6, 7, 8, 9, 10...
我得到了一个完美的答案:
I got a perfect answer to the question:
for(i in 2:max(o$id)){
combis=combn(unique(o$id),i)
for(j in 1:ncol(combis)){
sub=o[o$id %in% combis[,j],]
out=sub[1,] # use your function
out$label=paste(combis[,j],collapse ='') #provide an id so you know for which combination this result is
result=rbind(result,out) # paste it to previous output
}
}
但是,现在我的问题如下:有没有办法指定我只想组合至少5个ID ?这个过程占用了大量的计算时间,我注意到了t小数据集(带有5个不同的ids)给出有偏见的结果。
However, my question now is the following: is there a way to specify that I only want combinations of at least 5 ids combined? The process takes up a lot of computing time and I noticed that small datasets (with les than 5 different ids) give biased results.
通过这个链接,可以找到数据集和完整代码的示例来重现示例。请注意,运行整个代码可能需要一段时间,除非有一些指定,我只对至少5个ids的组合感兴趣。
Through this link, a sample of the dataset and the full code can be found to reproduce the example. Please be aware that it can take a while to run the entire code, unless there is something specified that I am only interested in combinations of at least 5 ids.
推荐答案
您可以在5开始循环:
for(i in 5:max(o$id)){
combis=combn(unique(o$id),i)
...
这样,每个组合中至少有5个元素(参见?combn)。
This way, there are at least 5 elements in each combination (see ?combn).
这篇关于如果组合至少5个ID,则根据变量ID组合观察值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!