取具有相同功能的data.frame的子集,然后从每个子集中选择一行 [英] Take the subsets of a data.frame with the same feature and select a single row from each subset
问题描述
假设我在R中有一个矩阵,如下所示:
Suppose I have a matrix in R as follows:
ID Value
1 10
2 5
2 8
3 15
4 7
4 9
...
我需要的是一个随机样本,其中每个元素仅被代表一次.
What I need is a random sample where every element is represented once and only once.
这意味着将选择ID 1,选择ID 2的两行之一,ID 3,选择ID 4的两行之一,等等...
That means that ID 1 will be chosen, one of the two rows with ID 2, ID 3 will be chosen, one of the two rows with ID 4, etc...
可以有两个以上的重复项.
There can be more than two duplicates.
我正在尝试找出最R风格的方法来执行此操作,而无需对子集进行子集设置和采样?
I'm trying to figure out the most R-esque way to do this without subsetting and sampling the subsets?
谢谢!
推荐答案
tapply
遍及rownames
,并在每个ID
组中获取1
的sample
:
tapply
across the rownames
and grab a sample
of 1
in each ID
group:
dat[tapply(rownames(dat),dat$ID,FUN=sample,1),]
# ID Value
#1 1 10
#3 2 8
#4 3 15
#6 4 9
如果您的数据确实是matrix
而不是data.frame
,则也可以使用以下方法解决此问题:
If your data is truly a matrix
and not a data.frame
, you can work around this too, with:
dat[tapply(as.character(seq(nrow(dat))),dat$ID,FUN=sample,1),]
不要试图删除as.character
,因为当只有一个值传递给sample
时,它会产生意想不到的结果.例如.
Don't be tempted to remove the as.character
, as sample
will give unintended results when there is only one value passed to it. E.g.
replicate(10, sample(4,1) )
#[1] 1 1 4 2 1 2 2 2 3 4
这篇关于取具有相同功能的data.frame的子集,然后从每个子集中选择一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!