使用Postgres插入数据并设置外键 [英] Insert data and set foreign keys with Postgres

查看:160
本文介绍了使用Postgres插入数据并设置外键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更改架构后,我必须在Postgres DB中迁移大量现有数据。

I have to migrate a large amount of existing data in a Postgres DB after a schema change.

在旧架构中,国家/地区属性将存储在用户中表。现在,国家/地区属性已移至单独的地址表中:

In the old schema a country attribute would be stored in the users table. Now the country attribute has been moved into a separate address table:

users:
  country # OLD
  address_id # NEW [1:1 relation]

addresses:
  id
  country

该模式实际上更复杂,并且地址所包含的不只是国家/地区。因此,每个用户都需要有自己的地址(1:1关系)。

The schema is actually more complex and the address contains more than just the country. Thus, every user needs to have his own address (1:1 relation).

在迁移数据时,在用户表中设置外键时遇到问题插入地址后:

When migrating the data, I'm having problems setting the foreign keys in the users table after inserting the addresses:

INSERT INTO addresses (country) 
    SELECT country FROM users WHERE address_id IS NULL 
    RETURNING id;

如何传播插入行的ID并在用户表?

到目前为止,我唯一能想到的解决方案是在地址表中创建一个临时的user_id列,然后更新address_id:

The only solution I could come up with so far is creating a temporary user_id column in the addresses table and then updating the the address_id:

UPDATE users SET address_id = a.id FROM addresses AS a 
    WHERE users.id = a.user_id;

然而,事实证明这非常慢(尽管在users.id和地址上都使用了索引。 user_id)。

However, this turned out to be extremely slow (despite using indices on both users.id and addresses.user_id).

users表包含约300万行,其中300k缺少关联地址。

The users table contains about 3 million rows with 300k missing an associated address.

是否还有其他方法可以将派生数据插入一个表,并在另一个表中设置对插入数据的外键引用(无需更改架构本身)?

我正在使用Postgres 8.3.14。

I'm using Postgres 8.3.14.

谢谢

我现在已经解决了通过使用Python / sqlalchemy脚本迁移数据来解决问题。事实证明,对我来说,这比尝试使用SQL容易得多。不过,如果有人知道在Postgres SQL中处理INSERT语句的RETURNING结果的方法,我还是很感兴趣。

推荐答案

users 必须具有一些您没有透露的主键。出于这个答案的目的,我将其命名为 users_id

The table users must have some primary key that you did not disclose. For the purpose of this answer I will name it users_id.

您可以使用 数据修改CTE 与PostgreSQL 9.1

如果我们可以假定 国家是唯一的,整个操作相当简单:

If we can assume that country is unique, the whole operation is rather trivial:

WITH i AS (
    INSERT INTO addresses (country) 
    SELECT country
    FROM   users
    WHERE  address_id IS NULL 
    RETURNING id, country
    )
UPDATE users u
SET    address_id = i.id
FROM   i
WHERE  i.country = u.country;

您在问题中提到版本 8.3 。如果同时没有进行升级,则可能要考虑升级。 寿命即将到来的8.3。

You mention version 8.3 in your question. If you did not get around to upgrade in the meantime, you might want to consider upgrading. End of life is coming soon for 8.3.

可能的是,这对于8.3版来说已经足够简单了。您只需要两个语句:

Be that as it may, this is simple enough with version 8.3. You just need two statements:

INSERT INTO addresses (country) 
SELECT country
FROM   users
WHERE  address_id IS NULL;

UPDATE users u
SET    address_id = a.id
FROM   addresses a
WHERE  address_id IS NULL 
AND    a.country = u.country;

如果 国家不是唯一的,这变得更具挑战性。您可以仅创建一个地址并多次链接到该地址。但是您确实提到了1:1关系,排除了这种便捷的解决方案。

If country is not unique, it becomes more challenging. You could just create one address and link to it multiple times. But you did mention a 1:1 relationship that rules out such a convenient solution.

对于版本 9.1

WITH s AS (
    SELECT users_id, country
         , row_number() OVER (PARTITION BY country) AS rn
    FROM   users
    WHERE  address_id IS NULL 
    )
    , i AS (
    INSERT INTO addresses (country) 
    SELECT country
    FROM   s
    RETURNING id, country
    )
    , r AS (
    SELECT *
         , row_number() OVER (PARTITION BY country) AS rn
    FROM   i
    )
UPDATE users u
SET    address_id = r.id
FROM   r
JOIN   s USING (country, rn)    -- select exactly one id for every user
WHERE  u.users_id = s.users_id
AND    u.address_id IS NULL;

因为没有办法明确分配一个 id INSERT 返回给具有相同国家的集合中的每个用户的$ c>,我使用窗口函数 row_number() 使其具有独特性。

As there is no way to unambiguously assign exactly one id returned from the INSERT to every user in a set with identical country, I use the window function row_number() to make them unique.

8.3 版本不那么直接。一种可能的方式:

Not as straight forward with version 8.3. One possible way:

INSERT INTO addresses (country) 
SELECT DISTINCT country -- pick just one per set of dupes
FROM   users
WHERE  address_id IS NULL;

UPDATE users u
SET    address_id = a.id
FROM   addresses a
WHERE  a.country = u.country
AND    u.address_id IS NULL
AND NOT EXISTS (
    SELECT * FROM addresses b
    WHERE  b.country = a.country
    AND    b.users_id < a.users_id
    ); -- effectively picking the smallest users_id per set of dupes

重复此操作,直到最后一个 NULL 值从 users.address_id 中消失。

Repeat this until the last NULL value is gone from users.address_id.

这篇关于使用Postgres插入数据并设置外键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆