R - 在数据帧的子集中找到所有唯一的值 [英] R - find all unique values among subsets of a data frame

查看:163
本文介绍了R - 在数据帧的子集中找到所有唯一的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据框。第一列定义了数据的子集。我想在第二列中找到只显示在第一列的一个子集中的所有值。



例如,从:

  df = data.frame(
data_subsets = rep(LETTERS [1:2],each = 5),
data_values = c ,2,3,4,5,2,3,4,6,7))

data_subsets data_values
A 1
A 2
A 3
A 4
A 5
B 2
B 3
B 4
B 6
B 7

我想提取以下数据框。

  data_subsets data_values 
A 1
A 5
B 6
B 7

我一直在玩重复的,但我似乎不能让它工作。任何帮助是赞赏。有一些问题处理类似的问题,我希望我没有忽视我的搜索中的答案!



编辑



我修改了@Matthew Lundberg的方法来计算元素数量并从数据框中提取。由于某种原因,他的方法并不适用于我所拥有的数据框架,所以我想出了这一点,这不是很优雅,而是完成了工作:

  counting = rowSums(do.call(rbind,tapply(df $ data_subsets,df $ data_values,FUN = table)))
extract =名称(计数)[计数== 1]
df [match(extract,df $ data_values)]]


解决方案

首先,找到df $ data_values中每个元素的计数:

  x<  -  sapply(df $ data_values,function(x)sum(as.numeric(df $ data_values == x)))

> x
[1] 1 2 2 2 1 2 2 2 1 1

行:

 > df [x == 1,] 
data_subsets data_values
1 A 1
5 A 5
9 B 6
10 B 7

请注意,您错过了上面的A 5。没有B 5。


I have a data frame with two columns. The first column defines subsets of the data. I want to find all values in the second column that only appear in one subset in the first column.

For example, from:

df=data.frame(
  data_subsets=rep(LETTERS[1:2],each=5),
  data_values=c(1,2,3,4,5,2,3,4,6,7))

data_subsets data_values
      A           1
      A           2
      A           3
      A           4
      A           5
      B           2
      B           3
      B           4
      B           6
      B           7

I would want to extract the following data frame.

data_subsets   data_values
    A              1
    A              5
    B              6
    B              7

I have been playing around with duplicated but I just can't seem to make it work. Any help is appreciated. There are a number of topics tackling similar problems, I hope I didn't overlook the answer in my searches!

EDIT

I modified the approach from @Matthew Lundberg of counting the number of elements and extracting from the data frame. For some reason his approach was not working with the data frame I had, so I came up with this, which is less elegant but gets the job done:

counts=rowSums(do.call("rbind",tapply(df$data_subsets,df$data_values,FUN=table)))
extract=names(counts)[counts==1]
df[match(extract,df$data_values),]

解决方案

First, find the count of each element in df$data_values:

 x <- sapply(df$data_values, function(x) sum(as.numeric(df$data_values == x)))

> x
 [1] 1 2 2 2 1 2 2 2 1 1

Now extract the rows:

> df[x==1,]
   data_subsets data_values
1             A           1
5             A           5
9             B           6
10            B           7

Note that you missed "A 5" above. There is no "B 5".

这篇关于R - 在数据帧的子集中找到所有唯一的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆