A列的组间(组之间)组合,按B列分组 [英] inter-group (between groups) combination of column A grouped by column B

查看:83
本文介绍了A列的组间(组之间)组合,按B列分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为这是一个图形理论问题:我们可以在两组点之间画几条线...我不熟悉...

I think this is a graphical theory question: how many lines can we draw between two sets of points... which i'm not familiar with...

例如

df = data.frame(city = c('Boston', 'Cambridge', 'Long Island', 'NYC'),
                state = c('MA', 'MA', 'NY', 'NY'))

         city state
1      Boston    MA
2   Cambridge    MA
3 Long Island    NY
4         NYC    NY

随州吐痰/分组。

Boston - Long Island
Boston - NYC
Cambridge - Long Island
Cambridge - NYC

换句话说,我想生成两个城市不同的每个城市对

In other words, I want to generate every city pair where the two cities are in different states.

一个更通用的示例:

set.seed(123)
df = data.frame(value = 1:100,
                group = letters[sample(1:26, 100, replace=T)])

> df
    value group
1       1     e
2       2     m
3       3     g
4       4     o
5       5     p
6       6     a
7       7     i
8       8     o
9       9     i
10     10     h
11     11     p
12     12     h
...    ...    ...

我想要所有组合(值1,值2)或等效地(索引1,索引2),其中value1和value2具有不同的组标签。

I want all combination (value1, value2) or equivalently (index1, index2) where value1 and value2 has different group labels.

推荐答案

For循环,尽管在R中不鼓励使用,但可以用来获得所需的结果:

For loop, although discouraged in R, can be used to get desired result:

ddf = data.frame(value = 1:20,  group = letters[sample(1:3, 20, replace=T)])
head(ddf)
  value group
1     1     b
2     2     b
3     3     b
4     4     c
5     5     a
6     6     a

for(i in 1:20){
    tempdf = ddf[ddf$group!=ddf[i,2],]
    cat(ddf[i,1],': ',tempdf[,1], '\n')
}

1 :  4 5 6 8 9 10 13 15 17 19 20 
2 :  4 5 6 8 9 10 13 15 17 19 20 
3 :  4 5 6 8 9 10 13 15 17 19 20 
4 :  1 2 3 5 6 7 8 11 12 13 14 16 18 19 
5 :  1 2 3 4 7 9 10 11 12 14 15 16 17 18 20 
6 :  1 2 3 4 7 9 10 11 12 14 15 16 17 18 20 
7 :  4 5 6 8 9 10 13 15 17 19 20 
8 :  1 2 3 4 7 9 10 11 12 14 15 16 17 18 20 
9 :  1 2 3 5 6 7 8 11 12 13 14 16 18 19 
10 :  1 2 3 5 6 7 8 11 12 13 14 16 18 19 
11 :  4 5 6 8 9 10 13 15 17 19 20 
12 :  4 5 6 8 9 10 13 15 17 19 20 
13 :  1 2 3 4 7 9 10 11 12 14 15 16 17 18 20 
14 :  4 5 6 8 9 10 13 15 17 19 20 
15 :  1 2 3 5 6 7 8 11 12 13 14 16 18 19 
16 :  4 5 6 8 9 10 13 15 17 19 20 
17 :  1 2 3 5 6 7 8 11 12 13 14 16 18 19 
18 :  4 5 6 8 9 10 13 15 17 19 20 
19 :  1 2 3 4 7 9 10 11 12 14 15 16 17 18 20 
20 :  1 2 3 5 6 7 8 11 12 13 14 16 18 19 

每对都可以列出:

for(i in 1:20){
    tempdf = ddf[ddf$group!=ddf[i,2],]
    for(j in 1:nrow(tempdf)){
        cat(ddf[i,1], tempdf[j,1], '\n') 
    }
}

}
1 4 
1 5 
1 6 
1 8 
1 9 
1 10 
1 13 
1 15 
1 17 
1 19 
1 20 
2 4 
2 5 
2 6 
2 8 
2 9 
2 10 
2 13 
2 15 
2 17 
....

可以在另一个data.frame中轻松获得对。

The pairs can easily be obtained in another data.frame.

要创建另一个data.frame:

To create another data.frame:

outdf = data.frame(first=numeric(), second=numeric())

for(i in 1:20){
    tempdf = ddf[ddf$group!=ddf[i,2],]
    for(j in 1:nrow(tempdf)){
        outdf[nrow(outdf)+1,] = c(ddf[i,1], tempdf[j,1])
    }
}
head(outdf)
  first second
1     1      3
2     1      4
3     1      5
4     1      7
5     1      8
6     1      9

要删除重复项,请先对每对进行排序:

To remove duplicates, first sort each pair:

for(i in 1:nrow(outdf)){
    if(outdf[i,2] < outdf[i,1])
        outdf[i,] = c(outdf[i,2], outdf[i,1])
}
outdf

要对每一行进行排序,最好使用以下R代码:

For sorting each row, following R code may be preferred:

outdf = data.frame(t(apply(outdf, 1, sort)))

然后删除重复项:

outdf = outdf[!duplicated(outdf),]

唯一对的数量为:

nrow(outdf)

这篇关于A列的组间(组之间)组合,按B列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆