mysql两行之间的匹配数 [英] The number of matches between two rows mysql
问题描述
所以,这是挑战:
我有两个表:
Etalon:
+-----+-----+-----+-----+----+
| e1 | e2 | e3 | e4 | e5 |
+-----+-----+-----+-----+----+
| 01 | 02 | 03 | 04 | 05 |
+-----+-----+-----+-----+----+
和候选人:
+-----+----+-----+-----+-----+----+----+
| ID | c1 | c2 | c3 | c4 | c5 | nn |
+-----+----+-----+-----+-----+----+----+
| 00 | 03 | 08 | 02 | 01 | 06 | ** |
+-----+----+-----+-----+-----+----+----+
| 01 | 05 | 04 | 03 | 02 | 01 | ** |
+-----+----+-----+-----+-----+----+----+
| 02 | 06 | 07 | 08 | 09 | 10 | ** |
+-----+----+-----+-----+-----+----+----+
| 03 | 08 | 06 | 09 | 02 | 07 | ** |
+-----+----+-----+-----+-----+----+----+
我应该使用什么请求来查找并保存(在nn列中)每行中两行(e1,e2,e3,e4,e5和c1,c2,c3,c4,c5)之间的匹配数表候选人?
What request should I use, to find and save (in nn column) the number of matches between two rows (e1, e2, e3, e4, e5 and c1, c2, c3, c4, c5) for each row in table candidate?
应该是下一个结果:
候选人:
|-----|----|-----|-----|-----|-----|----|
| ID | c1 | c2 | c3 | c4 | c5 | nn |
|-----|----|-----|-----|-----|-----|----|
| 00 | 03 | 08 | 02 | 01 | 06 | 03 |
|-----|----|-----|-----|-----|-----|----|
| 01 | 05 | 04 | 03 | 02 | 01 | 05 |
|-----|----|-----|-----|-----|-----|----|
| 02 | 06 | 07 | 08 | 09 | 10 | 00 |
|-----|----|-----|-----|-----|-----|----|
| 03 | 08 | 06 | 09 | 02 | 07 | 01 |
|-----|----|-----|-----|-----|-----|----|
nn的结果是:
0 - no matches
1,2,3,4,5 - numbers of matches
我该如何实现?
推荐答案
目标是在主表和客户端表的每一行之间建立最大的部分匹配,而不必考虑各个列的标识.
The objective is to establish a maximal partial matching between the master row and each row of the client table without regard to the respective column identities.
这个想法是通过以另一种方式表示列内容来抽象列ID.正如您指出值域为{1, ..., 10}
一样,您可以选择前10个素数{p_1, ...,p_10} = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 }
,将i
映射到p_i
.比较将基于映射列值的乘积.这种方法利用素数分解的唯一性,即.每个正整数都会分解为唯一的素数多集.
The idea is to abstract away from the column ids by representing the column contents in another way. As you indicated that the value domain is {1, ..., 10}
, one may choose the first 10 prime numbers {p_1, ...,p_10} = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 }
, mapping i
to p_i
. The comparisons will be based on the product of the mapped column values. This approach exploits the uniqueness of prime factorization, ie. every positive integer factorizes into a unique multi-set of prime numbers.
写下一遍独立的SQL更新语句非常麻烦,因此我们创建了一个临时表,其中包含映射值的乘积:
A one-pass standalone sql update statement is rather cumbersome to write down, therefore we create a temporary table that contains the products of the mapped values:
CREATE TEMPORARY TABLE t_pp (
id NUMBER
, mp_candidates NUMBER
, mp_etalon NUMBER
, nn NUMBER
);
INSERT INTO t_pp ( id, mp_candidates, mp_etalon )
SELECT id
, CASE c1
WHEN 1 THEN 2
WHEN 2 THEN 3
WHEN 3 THEN 5
WHEN 4 THEN 7
WHEN 5 THEN 11
WHEN 6 THEN 13
WHEN 7 THEN 17
WHEN 8 THEN 19
WHEN 9 THEN 23
WHEN 10 THEN 29
ELSE 31
END
* CASE c2 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE c3 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE c4 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE c5 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
mp_candidates
, CASE e1
WHEN 1 THEN 2
WHEN 2 THEN 3
WHEN 3 THEN 5
WHEN 4 THEN 7
WHEN 5 THEN 11
WHEN 6 THEN 13
WHEN 7 THEN 17
WHEN 8 THEN 19
WHEN 9 THEN 23
WHEN 10 THEN 29
ELSE 31
END
* CASE e2 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE e3 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE e4 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE e5 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
mp_etalon
, 0 nn
FROM candidates
CROSS JOIN etalon
;
现在进入通行证#2-计算比赛次数:
Now for pass #2 - counting matches:
UPDATE t_pp
SET nn =
CASE WHEN mp_candidates MOD 2 = 0 AND mp_etalon MOD 2 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 3 = 0 AND mp_etalon MOD 3 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 5 = 0 AND mp_etalon MOD 5 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 7 = 0 AND mp_etalon MOD 7 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 11 = 0 AND mp_etalon MOD 11 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 13 = 0 AND mp_etalon MOD 13 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 17 = 0 AND mp_etalon MOD 17 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 19 = 0 AND mp_etalon MOD 19 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 23 = 0 AND mp_etalon MOD 23 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 29 = 0 AND mp_etalon MOD 29 = 0 THEN 1 ELSE 0 END
;
最后,将结果转移到原始表并进行清理:
Finally, transferring the results to the original table and cleaning up:
UPDATE candidates c
set nn = ( SELECT p.nn FROM t_pp p WHERE p.id = c.id )
;
DELETE TEMPORARY TABLE t_pp;
更多注意事项:
- 所示方案假定单元格值在每一行中都是唯一的.但是,可以很容易地扩展它以允许多次出现值.
- 原则上,可以将其包装在单个sql语句中-出于明显的原因,不建议这样做.
- 除mysql以外的Rdbms遵循sql标准,并提供了
WITH
子句,从而消除了对temporaray表的需要. - 上述
CASE
表达式的ELSE
分支中的值31
是伪值.
- The scheme as shown assumes that cell values are unique within each row. However, it can easily be extended to allow formultiple occurrences of values.
- In principle, this can be wrapped in a single sql statement - for obvious reasons this is not recommended.
- Rdbms other than mysql follow the sql standard and provide the
WITH
clause that obviates the need for a temporaray table. - The value
31
in theELSE
branch of the aboveCASE
expressions is a dummy value.
这篇关于mysql两行之间的匹配数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!