滤除R中特定样本对中的观测值 [英] Filter out observations present in specific pairs of samples in R

查看:154
本文介绍了滤除R中特定样本对中的观测值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个与样本相关的观察结果列表。我想删除在特定样本对中出现的相同观察结果。

I have a list of observations associated with samples. I would like to remove identical observations that occur in specific pairs of samples.

数据示例:

sample observation
sample1A 5
sample1B 7
sample2A 10
sample2B 10
sample3A 10
sample3B 5

因此,想法是根据字母A和B将样本分为几对,然后对每对

So the idea would be to group samples into pairs based on the letters A and B, and then for each of these pairs remove any rows with matching observations.

在上述情况下,由于样本2A和样本2B来自同一样本sample2(在两个样本中进行采样),因此排除了它们分开的场合(样品2A和样品2B)。输出如下所示:

In the case above only the observations from sample2A and sample 2B would be excluded as they are from the same sample, sample2, sampled on two separate occasions (sample2A & sample 2B). The output would look like:

sample observation
sample1A 5
sample1B 7
sample3A 10
sample3B 5

如果可以使用DPLYR这样做会特别有用,因为我正在努力提高自己的水平。

If it is possible to do this using DPLYR that would be extra useful, as I am trying to improve my proficiency with it.

我想象使用group_by()根据样本名称将数据分为几类,然后使用filter()可行,但我不确定如何处理嵌套的条件式,首先基于正则表达式或字符串进行配对,然后通过查找行之间的匹配值进行过滤。

I imagine that using group_by() to sort the data into groups based on the sample names and then using filter() could work but I am not sure how to handle the nested conditionals of first pairing based on a regular expression or string, then filtering by looking for matching values between rows.

在此先感谢您的帮助。

推荐答案

我们可以创建一个组,方法是删除样本中的最后一个字符,然后根据唯一的观察数,即 length filter $ c>大于1,我们将其保留

We can create a group by removing the last character in 'sample' and then filter based on the number of unique 'observation' i.e. if the length is greater than 1, we keep it

library(dplyr)
df2 %>%
  group_by(grp = sub("[A-Z]$", "", sample)) %>%
  filter(n_distinct(observation)>1) %>% 
  ungroup() %>% 
  select(-grp)
# A tibble: 4 x 2
#    sample observation
#      <chr>       <int>
#1 sample1A           5
#2 sample1B           7
#3 sample3A          10
#4 sample3B           5



数据



data

df2 <- structure(list(sample = c("sample1A", "sample1B", "sample2A", 
"sample2B", "sample3A", "sample3B"), observation = c(5L, 7L, 
10L, 10L, 10L, 5L)), .Names = c("sample", "observation"),
 class = "data.frame", row.names = c(NA, -6L))

这篇关于滤除R中特定样本对中的观测值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆