mysql两行之间的匹配数 [英] The number of matches between two rows mysql

查看:86
本文介绍了mysql两行之间的匹配数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,这是挑战:

我有两个表:

Etalon:

+-----+-----+-----+-----+----+
|  e1 |  e2 |  e3 |  e4 | e5 |
+-----+-----+-----+-----+----+
|  01 |  02 |  03 |  04 | 05 |
+-----+-----+-----+-----+----+

和候选人:

+-----+----+-----+-----+-----+----+----+
| ID  | c1 | c2  | c3  | c4  | c5 | nn |
+-----+----+-----+-----+-----+----+----+
| 00  | 03 | 08  | 02  | 01  | 06 | ** |
+-----+----+-----+-----+-----+----+----+
| 01  | 05 | 04  | 03  | 02  | 01 | ** |
+-----+----+-----+-----+-----+----+----+
| 02  | 06 | 07  | 08  | 09  | 10 | ** |
+-----+----+-----+-----+-----+----+----+
| 03  | 08 | 06  | 09  | 02  | 07 | ** |
+-----+----+-----+-----+-----+----+----+

我应该使用什么请求来查找并保存(在nn列中)每行中两行(e1,e2,e3,e4,e5和c1,c2,c3,c4,c5)之间的匹配数表候选人?

What request should I use, to find and save (in nn column) the number of matches between two rows (e1, e2, e3, e4, e5 and c1, c2, c3, c4, c5) for each row in table candidate?

应该是下一个结果:

候选人:

|-----|----|-----|-----|-----|-----|----|
| ID  | c1 | c2  | c3  | c4  | c5  | nn |
|-----|----|-----|-----|-----|-----|----|
| 00  | 03 | 08  | 02  | 01  | 06  | 03 |
|-----|----|-----|-----|-----|-----|----|
| 01  | 05 | 04  | 03  | 02  | 01  | 05 |
|-----|----|-----|-----|-----|-----|----|
| 02  | 06 | 07  | 08  | 09  | 10  | 00 |
|-----|----|-----|-----|-----|-----|----|
| 03  | 08 | 06  | 09  | 02  | 07  | 01 |
|-----|----|-----|-----|-----|-----|----|

nn的结果是:

0 - no matches
1,2,3,4,5 - numbers of matches 

我该如何实现?

推荐答案

目标是在主表和客户端表的每一行之间建立最大的部分匹配,而不必考虑各个列的标识.

The objective is to establish a maximal partial matching between the master row and each row of the client table without regard to the respective column identities.

这个想法是通过以另一种方式表示列内容来抽象列ID.正如您指出值域为{1, ..., 10}一样,您可以选择前10个素数{p_1, ...,p_10} = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 },将i映射到p_i.比较将基于映射列值的乘积.这种方法利用素数分解的唯一性,即.每个正整数都会分解为唯一的素数多集.

The idea is to abstract away from the column ids by representing the column contents in another way. As you indicated that the value domain is {1, ..., 10}, one may choose the first 10 prime numbers {p_1, ...,p_10} = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 }, mapping i to p_i. The comparisons will be based on the product of the mapped column values. This approach exploits the uniqueness of prime factorization, ie. every positive integer factorizes into a unique multi-set of prime numbers.

写下一遍独立的SQL更新语句非常麻烦,因此我们创建了一个临时表,其中包含映射值的乘积:

A one-pass standalone sql update statement is rather cumbersome to write down, therefore we create a temporary table that contains the products of the mapped values:

CREATE TEMPORARY TABLE t_pp (
      id            NUMBER
    , mp_candidates NUMBER
    , mp_etalon     NUMBER
    , nn            NUMBER
);
INSERT INTO t_pp ( id, mp_candidates, mp_etalon )
     SELECT id
          ,   CASE c1
                  WHEN  1 THEN  2
                  WHEN  2 THEN  3
                  WHEN  3 THEN  5
                  WHEN  4 THEN  7
                  WHEN  5 THEN 11
                  WHEN  6 THEN 13
                  WHEN  7 THEN 17
                  WHEN  8 THEN 19
                  WHEN  9 THEN 23
                  WHEN 10 THEN 29
                  ELSE         31
              END
            * CASE c2 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE c3 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE c4 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE c5 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
                mp_candidates

          ,   CASE e1
                  WHEN  1 THEN  2
                  WHEN  2 THEN  3
                  WHEN  3 THEN  5
                  WHEN  4 THEN  7
                  WHEN  5 THEN 11
                  WHEN  6 THEN 13
                  WHEN  7 THEN 17
                  WHEN  8 THEN 19
                  WHEN  9 THEN 23
                  WHEN 10 THEN 29
                  ELSE         31
              END
            * CASE e2 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE e3 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE e4 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE e5 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
                mp_etalon
          , 0   nn
       FROM candidates
 CROSS JOIN etalon     
          ;

现在进入通行证#2-计算比赛次数:

Now for pass #2 - counting matches:

UPDATE t_pp
   SET nn =
             CASE WHEN mp_candidates MOD  2 = 0 AND mp_etalon MOD  2 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD  3 = 0 AND mp_etalon MOD  3 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD  5 = 0 AND mp_etalon MOD  5 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD  7 = 0 AND mp_etalon MOD  7 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 11 = 0 AND mp_etalon MOD 11 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 13 = 0 AND mp_etalon MOD 13 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 17 = 0 AND mp_etalon MOD 17 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 19 = 0 AND mp_etalon MOD 19 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 23 = 0 AND mp_etalon MOD 23 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 29 = 0 AND mp_etalon MOD 29 = 0  THEN 1 ELSE 0 END
     ;

最后,将结果转移到原始表并进行清理:

Finally, transferring the results to the original table and cleaning up:

UPDATE candidates c
   set nn = ( SELECT p.nn FROM t_pp p WHERE p.id = c.id )
     ;
DELETE TEMPORARY TABLE t_pp;

更多注意事项:

  • 所示方案假定单元格值在每一行中都是唯一的.但是,可以很容易地扩展它以允许多次出现值.
  • 原则上,可以将其包装在单个sql语句中-出于明显的原因,不建议这样做.
  • 除mysql以外的Rdbms遵循sql标准,并提供了WITH子句,从而消除了对temporaray表的需要.
  • 上述CASE表达式的ELSE分支中的值31是伪值.
  • The scheme as shown assumes that cell values are unique within each row. However, it can easily be extended to allow formultiple occurrences of values.
  • In principle, this can be wrapped in a single sql statement - for obvious reasons this is not recommended.
  • Rdbms other than mysql follow the sql standard and provide the WITH clause that obviates the need for a temporaray table.
  • The value 31 in the ELSE branch of the above CASE expressions is a dummy value.

这篇关于mysql两行之间的匹配数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆