取具有相同功能的data.frame的子集，然后从每个子集中选择一行 [英] Take the subsets of a data.frame with the same feature and select a single row from each subset

查看：75 发布时间：2020/7/4 0:14:49 r random

本文介绍了取具有相同功能的data.frame的子集，然后从每个子集中选择一行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

假设我在R中有一个矩阵，如下所示:

Suppose I have a matrix in R as follows:

ID Value
1 10
2 5
2 8
3 15
4 7
4 9
...

我需要的是一个随机样本，其中每个元素仅被代表一次.

What I need is a random sample where every element is represented once and only once.

这意味着将选择ID 1，选择ID 2的两行之一，ID 3，选择ID 4的两行之一，等等...

That means that ID 1 will be chosen, one of the two rows with ID 2, ID 3 will be chosen, one of the two rows with ID 4, etc...

可以有两个以上的重复项.

There can be more than two duplicates.

我正在尝试找出最R风格的方法来执行此操作，而无需对子集进行子集设置和采样?

I'm trying to figure out the most R-esque way to do this without subsetting and sampling the subsets?

谢谢！

tapply遍及rownames，并在每个ID组中获取1的sample:

tapply across the rownames and grab a sample of 1 in each ID group:

dat[tapply(rownames(dat),dat$ID,FUN=sample,1),]

#  ID Value
#1  1    10
#3  2     8
#4  3    15
#6  4     9

如果您的数据确实是matrix而不是data.frame，则也可以使用以下方法解决此问题:

If your data is truly a matrix and not a data.frame, you can work around this too, with:

dat[tapply(as.character(seq(nrow(dat))),dat$ID,FUN=sample,1),]

不要试图删除as.character，因为当只有一个值传递给sample时，它会产生意想不到的结果.例如.

Don't be tempted to remove the as.character, as sample will give unintended results when there is only one value passed to it. E.g.

replicate(10, sample(4,1) )
#[1] 1 1 4 2 1 2 2 2 3 4

这篇关于取具有相同功能的data.frame的子集，然后从每个子集中选择一行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文