使用n:m-relation复制数据集 [英] Copy datasets with n:m-relation

查看:74
本文介绍了使用n:m-relation复制数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用单个SQL语句

I would like to use the single SQL-statement

insert into T (...) select ... from T where ...

复制大量数据集.我的问题是从表T到其他表有N:M关系,并且这些关系也必须被复制.如果我不知道哪个原始数据集属于哪个复制的数据集,该怎么办?让我以身作则.

to copy a lot of datasets. My problem is that there are N:M-relationships from table T to other tables and these have to be copied too. How can I do this, if I do not know which original dataset belongs to which copied dataset? Let me demonstrate by example.

T:

ID  | COL1 | COL2    
-----------------
1   | A    | B
2   | C    | D

N:M表从表T引用表U(未显示表U):

N:M-table references table U from table T (table U is not shown):

T   | U              
---------
1   | 100
1   | 101
2   | 100
2   | 102

我的复制操作,其中[???]是我不知道的部分:

My copy operation where [???] is the part I do not know:

insert into T (COL1, COL2) select COL1, COL2 from T
insert into NM (T, U) select [???]

之后的数据库内容:

T:

ID  | COL1 | COL2
-----------------
1   | A    | B
2   | C    | D
3   | A    | B
4   | C    | D

N:M表:

T   | U
---------
1   | 100
1   | 101
2   | 100
2   | 102
3   | 100
3   | 101
4   | 100
4   | 102

注意:

  • 我有数千个数据集(不只是两个)
  • 我想使用'insert ... select'以获得更好的性能

推荐答案

如果幸运的是,您可以运行当前的 PostgreSQL 9.1 ,则可以通过单个命令找到一种简洁,快速的解决方案使用新的 修改数据的CTE .

If you are lucky enough to run the current PostgreSQL 9.1, there is an elegant and fast solution with a single command using the new data-modifying CTEs.

不支持

No such luck with MySQL which does not support Common Table Expressions (CTE), not to mention data-modifying CTE.

假设(col1, col2)最初是唯一的:

  • 在这种情况下,您可以轻松地从表中选择任意切片.
  • 不会浪费t.id的序列号.
  • You can easily pick arbitrary slices from the table in this case.
  • No sequence numbers for t.id will be wasted.

WITH s AS (
    SELECT id, col1, col2
    FROM   t
--  WHERE  some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2   -- I gather from comments that id is a serial column
    FROM   s
    RETURNING id, col1, col2
    )
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM   tu
JOIN   s ON tu.t = s.id
JOIN   i USING (col1, col2);

如果(col1, col2) 不是唯一,我会看到另外两种方式:

If (col1, col2) is not unique, I see two other ways:

  • Use the window function row_number() to make non-unique rows unique.
  • INSERT rows without holes in the t.id space just like in the query above.

WITH s AS (
    SELECT id, col1, col2
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   t
--  WHERE some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2
    FROM   s
    RETURNING id, col1, col2
    )
    ,r AS (
    SELECT *
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   i
    )
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM   r
JOIN   s USING (col1, col2, rn)    -- match exactly one id per row
JOIN   tu ON tu.t = s.id;

查询3

  • 这基于@ypercube已经提供的相同思想,但是全部在一个查询中.
  • 如果当前t.id的数字空间中有孔,则会相应地为新行刻录序列号.
  • 别忘了重设序列,使其超过新的最大值,否则,t中的新插入物会重复出现键错误,这些错误会从序列中提取id的默认值.我将此作为命令的最后一步进行了集成.最快的这样最安全.
  • Query 3

    • This is based on the same idea that @ypercube already supplied, but all in one query.
    • If there are holes in numbers space for current t.id, sequence numbers will be burnt for the new rows accordingly.
    • Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in t that draw the default for id from the sequence. I integrated this as final step into the command. Fastest & safest this way.
    • WITH s AS (
          SELECT max(id) AS max_id
          FROM   t
          )
          ,i AS (
          INSERT INTO t (id, col1, col2)
          SELECT id + s.max_id, col1, col2
          FROM   t, s
          )
          ,j AS (
          INSERT INTO tu (t, u)
          SELECT tu.t + s.max_id, tu.u
          FROM   tu, s
          )
      SELECT setval('t_id_seq', s.max_id + s.max_id)
      FROM   s;
      

      手册中有关 setval()的详细信息.

      Details about setval() in the manual.

      进行快速测试.

      CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
      INSERT INTO t (col1, col2) VALUES 
       ('A', 'B')
      ,('C', 'D');
      
      CREATE TEMP TABLE tu (t int, u int);
      INSERT INTO tu VALUES
       (1, 100)
      ,(1, 101)
      ,(2, 100)
      ,(2, 102);
      
      SELECT * FROM t;
      SELECT * FROM tu;
      

      最近有一个类似的问题,我在其中提供了一个类似的答案.以及版本 8.3 的替代版本,其中没有CTE和窗口功能.

      There was a somewhat similar question recently, where I provided a somewhat similar answer. Plus alternatives for version 8.3 without CTEs and window functions.

      这篇关于使用n:m-relation复制数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆