R - 在数据帧的子集中找到所有唯一的值 [英] R - find all unique values among subsets of a data frame
问题描述
例如,从:
df = data.frame(
data_subsets = rep(LETTERS [1:2],each = 5),
data_values = c ,2,3,4,5,2,3,4,6,7))
data_subsets data_values
A 1
A 2
A 3
A 4
A 5
B 2
B 3
B 4
B 6
B 7
我想提取以下数据框。
data_subsets data_values
A 1
A 5
B 6
B 7
我一直在玩重复的
,但我似乎不能让它工作。任何帮助是赞赏。有一些问题处理类似的问题,我希望我没有忽视我的搜索中的答案!
编辑
我修改了@Matthew Lundberg的方法来计算元素数量并从数据框中提取。由于某种原因,他的方法并不适用于我所拥有的数据框架,所以我想出了这一点,这不是很优雅,而是完成了工作:
counting = rowSums(do.call(rbind,tapply(df $ data_subsets,df $ data_values,FUN = table)))
extract =名称(计数)[计数== 1]
df [match(extract,df $ data_values)]]
首先,找到df $ data_values中每个元素的计数:
x< - sapply(df $ data_values,function(x)sum(as.numeric(df $ data_values == x)))
> x
[1] 1 2 2 2 1 2 2 2 1 1
行:
> df [x == 1,]
data_subsets data_values
1 A 1
5 A 5
9 B 6
10 B 7
请注意,您错过了上面的A 5。没有B 5。
I have a data frame with two columns. The first column defines subsets of the data. I want to find all values in the second column that only appear in one subset in the first column.
For example, from:
df=data.frame(
data_subsets=rep(LETTERS[1:2],each=5),
data_values=c(1,2,3,4,5,2,3,4,6,7))
data_subsets data_values
A 1
A 2
A 3
A 4
A 5
B 2
B 3
B 4
B 6
B 7
I would want to extract the following data frame.
data_subsets data_values
A 1
A 5
B 6
B 7
I have been playing around with duplicated
but I just can't seem to make it work. Any help is appreciated. There are a number of topics tackling similar problems, I hope I didn't overlook the answer in my searches!
EDIT
I modified the approach from @Matthew Lundberg of counting the number of elements and extracting from the data frame. For some reason his approach was not working with the data frame I had, so I came up with this, which is less elegant but gets the job done:
counts=rowSums(do.call("rbind",tapply(df$data_subsets,df$data_values,FUN=table)))
extract=names(counts)[counts==1]
df[match(extract,df$data_values),]
First, find the count of each element in df$data_values:
x <- sapply(df$data_values, function(x) sum(as.numeric(df$data_values == x)))
> x
[1] 1 2 2 2 1 2 2 2 1 1
Now extract the rows:
> df[x==1,]
data_subsets data_values
1 A 1
5 A 5
9 B 6
10 B 7
Note that you missed "A 5" above. There is no "B 5".
这篇关于R - 在数据帧的子集中找到所有唯一的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!