A列的组间(组之间)组合,按B列分组 [英] inter-group (between groups) combination of column A grouped by column B
问题描述
我认为这是一个图形理论问题:我们可以在两组点之间画几条线...我不熟悉...
I think this is a graphical theory question: how many lines can we draw between two sets of points... which i'm not familiar with...
例如
df = data.frame(city = c('Boston', 'Cambridge', 'Long Island', 'NYC'),
state = c('MA', 'MA', 'NY', 'NY'))
city state
1 Boston MA
2 Cambridge MA
3 Long Island NY
4 NYC NY
随州吐痰/分组。
Boston - Long Island
Boston - NYC
Cambridge - Long Island
Cambridge - NYC
换句话说,我想生成两个城市不同的每个城市对
In other words, I want to generate every city pair where the two cities are in different states.
一个更通用的示例:
set.seed(123)
df = data.frame(value = 1:100,
group = letters[sample(1:26, 100, replace=T)])
> df
value group
1 1 e
2 2 m
3 3 g
4 4 o
5 5 p
6 6 a
7 7 i
8 8 o
9 9 i
10 10 h
11 11 p
12 12 h
... ... ...
我想要所有组合(值1,值2)或等效地(索引1,索引2),其中value1和value2具有不同的组标签。
I want all combination (value1, value2) or equivalently (index1, index2) where value1 and value2 has different group labels.
推荐答案
For循环,尽管在R中不鼓励使用,但可以用来获得所需的结果:
For loop, although discouraged in R, can be used to get desired result:
ddf = data.frame(value = 1:20, group = letters[sample(1:3, 20, replace=T)])
head(ddf)
value group
1 1 b
2 2 b
3 3 b
4 4 c
5 5 a
6 6 a
for(i in 1:20){
tempdf = ddf[ddf$group!=ddf[i,2],]
cat(ddf[i,1],': ',tempdf[,1], '\n')
}
1 : 4 5 6 8 9 10 13 15 17 19 20
2 : 4 5 6 8 9 10 13 15 17 19 20
3 : 4 5 6 8 9 10 13 15 17 19 20
4 : 1 2 3 5 6 7 8 11 12 13 14 16 18 19
5 : 1 2 3 4 7 9 10 11 12 14 15 16 17 18 20
6 : 1 2 3 4 7 9 10 11 12 14 15 16 17 18 20
7 : 4 5 6 8 9 10 13 15 17 19 20
8 : 1 2 3 4 7 9 10 11 12 14 15 16 17 18 20
9 : 1 2 3 5 6 7 8 11 12 13 14 16 18 19
10 : 1 2 3 5 6 7 8 11 12 13 14 16 18 19
11 : 4 5 6 8 9 10 13 15 17 19 20
12 : 4 5 6 8 9 10 13 15 17 19 20
13 : 1 2 3 4 7 9 10 11 12 14 15 16 17 18 20
14 : 4 5 6 8 9 10 13 15 17 19 20
15 : 1 2 3 5 6 7 8 11 12 13 14 16 18 19
16 : 4 5 6 8 9 10 13 15 17 19 20
17 : 1 2 3 5 6 7 8 11 12 13 14 16 18 19
18 : 4 5 6 8 9 10 13 15 17 19 20
19 : 1 2 3 4 7 9 10 11 12 14 15 16 17 18 20
20 : 1 2 3 5 6 7 8 11 12 13 14 16 18 19
每对都可以列出:
for(i in 1:20){
tempdf = ddf[ddf$group!=ddf[i,2],]
for(j in 1:nrow(tempdf)){
cat(ddf[i,1], tempdf[j,1], '\n')
}
}
}
1 4
1 5
1 6
1 8
1 9
1 10
1 13
1 15
1 17
1 19
1 20
2 4
2 5
2 6
2 8
2 9
2 10
2 13
2 15
2 17
....
可以在另一个data.frame中轻松获得对。
The pairs can easily be obtained in another data.frame.
要创建另一个data.frame:
To create another data.frame:
outdf = data.frame(first=numeric(), second=numeric())
for(i in 1:20){
tempdf = ddf[ddf$group!=ddf[i,2],]
for(j in 1:nrow(tempdf)){
outdf[nrow(outdf)+1,] = c(ddf[i,1], tempdf[j,1])
}
}
head(outdf)
first second
1 1 3
2 1 4
3 1 5
4 1 7
5 1 8
6 1 9
要删除重复项,请先对每对进行排序:
To remove duplicates, first sort each pair:
for(i in 1:nrow(outdf)){
if(outdf[i,2] < outdf[i,1])
outdf[i,] = c(outdf[i,2], outdf[i,1])
}
outdf
要对每一行进行排序,最好使用以下R代码:
For sorting each row, following R code may be preferred:
outdf = data.frame(t(apply(outdf, 1, sort)))
然后删除重复项:
outdf = outdf[!duplicated(outdf),]
唯一对的数量为:
nrow(outdf)
这篇关于A列的组间(组之间)组合,按B列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!