MYSQL:避免在自联接时重复记录的笛卡尔积 [英] MYSQL: Avoiding cartesian product of repeating records when self-joining
问题描述
有两个表:表A和表B.它们具有相同的列,并且数据实际上是相同的.它们都具有自动递增的ID,两者之间的唯一区别是对于相同的记录,它们具有不同的ID.
There are two tables: table A and table B. They have the same columns and the data is practically identical. They both have auto-incremented IDs, the only difference between the two is that they have different IDs for the same records.
在各列中,有一个IDENTIFIER列不是唯一的,即,在两个表中都有(很少)具有相同IDENTIFIER的记录.
Among the columns, there is an IDENTIFIER column which is not unique, i.e. there are (very few) records with the same IDENTIFIER in both tables.
现在,为了找到表A的ID和表B的ID之间的对应关系,我必须将这两个表连接起来(出于所有目的,这是一个自连接), IDENTIFIER列,类似于:
Now, in order to find a correspondence between the IDs of table A and the IDs of table B, I have to join these two tables (for all purposes it's a self-join) on the IDENTIFIER column, something like:
SELECT A.ID, B.ID
FROM A INNER JOIN B ON A.IDENTIFIER = B.IDENTIFIER
但是,由于IDENTIFIER是唯一的,因此会生成IDENTIFIER重复值的所有可能组合,我不希望如此.
理想情况下,我想根据其顺序在具有重复IDENTIFIER值的ID之间生成一对一关联.例如,假设在表A(因此在表B)中有六个具有不同ID和相同IDENTIFIER值的记录:
Ideally, I would like to generate an one to one association between IDs that have repeating IDENTIFIER values, based on their order. For example, supposing that there are six records with different ID and the same IDENTIFIER value in table A (and thus in table B):
A B
IDENTIFIER:'ident105', ID:10 -> IDENTIFIER:'ident105', ID:3
IDENTIFIER:'ident105', ID:20 -> IDENTIFIER:'ident105', ID:400
IDENTIFIER:'ident105', ID:23 -> IDENTIFIER:'ident105', ID:420
IDENTIFIER:'ident105', ID:100 -> IDENTIFIER:'ident105', ID:512
IDENTIFIER:'ident105', ID:120 -> IDENTIFIER:'ident105', ID:513
IDENTIFIER:'ident105', ID:300 -> IDENTIFIER:'ident105', ID:798
那将是理想的. 无论如何,不管ID的顺序如何,都可以生成一对一的关联(但不是首选).
That would be ideal. Anyway, a way to generate a one to one association regardless of the order of the IDs would still be ok (but not preferred).
感谢您的时间,
西尔维奥
推荐答案
select a_numbered.id, a_numbered.identifier, b_numbered.id from
(
select a.*,
case
when @identifier = a.identifier then @rownum := @rownum + 1
else @rownum := 1
end as rn,
@identifier := a.identifier
from a
join (select @rownum := 0, @identifier := null) r
order by a.identifier
) a_numbered join (
select b.*,
case
when @identifier = b.identifier then @rownum := @rownum + 1
else @rownum := 1
end as rn,
@identifier := b.identifier
from b
join (select @rownum := 0, @identifier := null) r
order by b.identifier
) b_numbered
on a_numbered.rn=b_numbered.rn and a_numbered.identifier=b_numbered.identifier
这篇关于MYSQL:避免在自联接时重复记录的笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!