尝试在两列中保留非重复值时如何使用R中的数据帧的条件过滤 [英] How to use conditional filtering of a data frame in R when trying to retain non-duplicated values in two columns

查看：48 发布时间：2021/5/2 20:54:25 r dplyr filtering

本文介绍了尝试在两列中保留非重复值时如何使用R中的数据帧的条件过滤的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个这样组织的数据框:

I have a data frame that is organized as such:

df <- data.frame(ID=c(rep("1111", 16),rep("2222", 16)),
                 subID=rep(c(rep("100", 4), rep("200", 4), rep("300", 4), rep("400", 4)),2),
                 instance=rep(1:4, 8),
                 feature=rep(letters[1:4], 8)
                  )

看起来像这样:

> df
     ID subID instance feature
1  1111   100        1       a
2  1111   100        2       b
3  1111   100        3       c
4  1111   100        4       d
5  1111   200        1       a
6  1111   200        2       b
7  1111   200        3       c
8  1111   200        4       d
9  1111   300        1       a
10 1111   300        2       b
11 1111   300        3       c
12 1111   300        4       d
13 1111   400        1       a
14 1111   400        2       b
15 1111   400        3       c
16 1111   400        4       d
17 2222   100        1       a
18 2222   100        2       b
19 2222   100        3       c
20 2222   100        4       d
21 2222   200        1       a
22 2222   200        2       b
23 2222   200        3       c
24 2222   200        4       d
25 2222   300        1       a
26 2222   300        2       b
27 2222   300        3       c
28 2222   300        4       d
29 2222   400        1       a
30 2222   400        2       b
31 2222   400        3       c
32 2222   400        4       d

在真实数据集中，所有子ID都是从同一ID收集的唯一样本.您可以将它们视为在同一时间的四个时间点收集的样本.子ID 100到400分别与4个实例之一(即100 = 2、200 = 4、300 = 3和400 = 1)相关联，并且对于整个ID是唯一的.但我不知道实际的联系，因此需要进行手动记录审查以分配联系.为了使审核更快，我想保留每个subID的一个和每个实例的一个，就像这样:

In the real data set, all subIDs are unique samples collected from the same ID. You can think of them as a sample collected at four time points from the same location. The subIDs 100 through 400 are each associated with one of the 4 instances (i.e., 100 = 2, 200 = 4, 300 = 3, and 400 = 1), and are unique to the overall ID. but I do not know the actual linkage and will need to do a manual record review to assign the linkages. To make my review quicker, I want to retain one of each of the subID's and one of each of the instances, like so:

   ID  subID  instance  feature  truesubID
1 1111   100        1       a       
2 1111   200        2       b       
3 1111   300        3       c       
4 1111   400        4       d       
5 2222   100        1       a       
6 2222   200        2       b       
7 2222   300        3       c       
8 2222   400        4       d

这样，当我进行手动记录检查时，我知道可能的子ID号是什么，它们属于哪个ID，并且我知道要交叉引用的实例数.然后，我将真实的subID填写到最后一栏中.(例如，对于ID = 1111，subID = 100实际上是instance = 4，等等)

This way, when I do manual record review, I know what the possible subID numbers are, which ID they belong to, and I know how many instances to cross reference. I will then fill in the true subID into the last column. (e.g., subID=100 is really instance=4 for ID=1111, etc.)

您知道如何过滤第一个df，使其看起来像第二个吗?

Do you know how I could filter the first df to look like the second?

谢谢！

尝试在两列中保留非重复值时如何使用R中的数据帧的条件过滤 [英] How to use conditional filtering of a data frame in R when trying to retain non-duplicated values in two columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

尝试在两列中保留非重复值时如何使用R中的数据帧的条件过滤 [英] How to use conditional filtering of a data frame in R when trying to retain non-duplicated values in two columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭