取具有相同功能的data.frame的子集,然后从每个子集中选择一行 [英] Take the subsets of a data.frame with the same feature and select a single row from each subset

查看:75
本文介绍了取具有相同功能的data.frame的子集,然后从每个子集中选择一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在R中有一个矩阵,如下所示:

Suppose I have a matrix in R as follows:

ID Value
1 10
2 5
2 8
3 15
4 7
4 9
...

我需要的是一个随机样本,其中每个元素仅被代表一次.

What I need is a random sample where every element is represented once and only once.

这意味着将选择ID 1,选择ID 2的两行之一,ID 3,选择ID 4的两行之一,等等...

That means that ID 1 will be chosen, one of the two rows with ID 2, ID 3 will be chosen, one of the two rows with ID 4, etc...

可以有两个以上的重复项.

There can be more than two duplicates.

我正在尝试找出最R风格的方法来执行此操作,而无需对子集进行子集设置和采样?

I'm trying to figure out the most R-esque way to do this without subsetting and sampling the subsets?

谢谢!

推荐答案

tapply遍及rownames,并在每个ID组中获取1sample:

tapply across the rownames and grab a sample of 1 in each ID group:

dat[tapply(rownames(dat),dat$ID,FUN=sample,1),]

#  ID Value
#1  1    10
#3  2     8
#4  3    15
#6  4     9

如果您的数据确实是matrix而不是data.frame,则也可以使用以下方法解决此问题:

If your data is truly a matrix and not a data.frame, you can work around this too, with:

dat[tapply(as.character(seq(nrow(dat))),dat$ID,FUN=sample,1),]

不要试图删除as.character,因为当只有一个值传递给sample时,它会产生意想不到的结果.例如.

Don't be tempted to remove the as.character, as sample will give unintended results when there is only one value passed to it. E.g.

replicate(10, sample(4,1) )
#[1] 1 1 4 2 1 2 2 2 3 4

这篇关于取具有相同功能的data.frame的子集,然后从每个子集中选择一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆