如何在每个组中随机选择仅一行 [英] How to randomly choose only one row in each group
问题描述
说我有一个数据帧,如下所示:
Say I have a dataframe as follows:
df <- data.frame(Region = c("A","A","A","B","B","C","D","D","D","D"),
Combo = c(1,2,3,1,2,1,1,2,3,4))
> df
Region Combo
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 1
7 D 1
8 D 2
9 D 3
10 D 4
我想为每个地区(A,B,C,D)随机选择一个可能的组合
What I would like to do, is for each Region (A,B,C,D) randomly choose only one of the possible combos for that region.
如果所选组合用二进制变量表示,则可能看起来像这样:
If the chosen combination were indicated by a binary variable, it would look something potentially like this:
Region Combo RandomlyChosen
1 A 1 1
2 A 2 0
3 A 3 0
4 B 1 0
5 B 2 1
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
我知道示例函数,但只是不知道如何在每个区域内只选择一个组合。
I'm aware of the sample function, but just don't know how to choose only one combo within each region.
我经常使用data.table,因此任何欢迎使用该解决方案。尽管同样欢迎不使用data.table的解决方案。
I reglarly use data.table, so any solutions using that are welcome. Though solutions not using data.table are equally welcome.
谢谢!
推荐答案
在普通R语言中,您可以在 tapply()
中使用 sample()
:
In plain R you can use sample()
within tapply()
:
df$Chosen <- 0
df[-tapply(-seq_along(df$Region),df$Region, sample, size=1),]$Chosen <- 1
df
Region Combo Chosen
1 A 1 0
2 A 2 1
3 A 3 0
4 B 1 1
5 B 2 0
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
请注意-(-selected_row_number)
避免当一组有单个行号时避免从1采样到n的技巧
Note the -(-selected_row_number)
trick to avoid sampling from 1 to n when there is a single row number for one group
这篇关于如何在每个组中随机选择仅一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!