使用JSON参数的Postgres批量INSERT函数 [英] Postgres bulk INSERT function using JSON arguments

查看:368
本文介绍了使用JSON参数的Postgres批量INSERT函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是Postgres 9.6plpgsql函数.它尝试INSERT行,并且如果插入没有失败(由于违反键约束),那么它将运行更多命令.

Here's a plpgsql function for postgres 9.6. It tries to INSERT a row, and if the insert doesn't fail (due to a key constraint violation), then it runs a few more commands.

CREATE FUNCTION foo(int, text, text)
  RETURNS void AS
$$
BEGIN
  INSERT INTO table1 (id, val1, val2) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING;
  IF FOUND THEN
    INSERT INTO table2 (table1_id, val1) VALUES ($1, $2);
    UPDATE table3 SET (val2, time) = ($3, now()) WHERE table1_id = $1;
  END IF;
END
$$

此功能处理单个记录,但是如何修改它以处理成千上万的记录呢?

This function processes a single record, but how could you modify it to process a batch of thousands of records?

我找到了一个答案,它建议将3个函数参数中的每一个都设为一个数组.但是,如果我要传递更紧密地代表记录在应用程序中的外观的参数,是否有办法做到这一点?

I found an answer, which suggests to make each of the 3 function arguments an array. But is there a way to do it where I'm passing in arguments that more closely represent how the records would look in my application?

例如,理想的解决方案是我的应用程序代码调用select foo($1),其中参数$1是对象的 JSON数组,其中每个内部对象都是要插入的记录.

For example, the ideal solution would be my application code calls select foo($1), where the parameter $1 is a JSON array of objects, where each inner object is a record to be inserted.

[ 
  { "id": "1", "val1": "1-val1", "val2": "1-val2" },
  { "id": "2", "val1": "2-val1", "val2": "2-val2" },
  { "id": "3", "val1": "3-val1", "val2": "3-val2" },
  { "id": "4", "val1": "4-val1", "val2": "4-val2" }
]

第二好的选择是我的应用程序代码调用select foo($1, $2, $3, $4),其中每个参数是与要插入的记录相对应的 JSON对象.

The second-best option would be my application code calls select foo($1, $2, $3, $4), where each parameter is a JSON object corresponding to a record to be inserted.

{ "id": "1", "val1": "1-val1", "val2": "1-val2" }  // This would be $1
{ "id": "2", "val1": "2-val1", "val2": "2-val2" }  // This would be $2

我正在查看Postgres提供的各种JSON函数此处,它们似乎与此有关,但我不知道确切使用哪个.我要做什么甚至有可能吗?是否可以在任何地方使用 JSON数组代替 JSON对象?

I'm looking at the various JSON functions offered by Postgres here and they seem relevant to this but I can't figure out which exactly to use. Is what I'm looking to do even possible? Would using JSON arrays instead of JSON objects anywhere make this possible?

推荐答案

用于数千条记录

1..创建由输入值$1$2$3组成的输入行的临时表.最快的上传方法是 COPY -或psql的 \copy元命令如果数据不在同一台计算机上.让我们假设这张表:

For thousands of records

1. Create a temporary table of input rows, comprised of your values $1, $2, $3. The fastest way to upload is COPY - or the \copy meta-command of psql if the data is not on the same machine. Let's suppose this table:

CREATE TEMP TABLE tmp(id int PRIMARY KEY, val1 text, val2 text);

我添加了一个PK约束,它完全是可选的,但是可以确保我们正在处理唯一的非null int值.如果您可以保证输入数据,则不需要约束.

I added a PK constraint, which is totally optional, but it makes sure we are dealing with unique not-null int values. If you can vouch for input data, you don't need the constraint.

2..将您的命令与修改数据的CTE链接在一起.正如我们根据您的

2. Chain your commands with data-modifying CTEs. As we have determined under your previous question, there are no race conditions to take care of in this particular operation.

WITH ins1 AS (
   INSERT INTO table1 AS t1 (id, val1, val2)
   SELECT id, val1, val2 FROM tmp ON CONFLICT DO NOTHING
   RETURNING t1.id, t1.val1, t1.val2  -- only actually inserted rows returned
   )
, ins2 AS (
   INSERT INTO table2 (table1_id, val1)
   SELECT id, val1 FROM ins1
   )
UPDATE table3 t3
SET    val2 = i.val2
     , time = now()
FROM   ins1 i
WHERE  t3.table1_id = i.id;

第1步和第2步必须在 同一会话 中运行(不一定是同一事务),因为临时表的范围绑定到同一会话

Step 1. and 2. must must run in the same session (not necessarily the same transaction), since the scope of temp tables is bound to the same session.

请注意,UPDATE仅取决于第一个INSERT,可以确保第二个INSERT成功,因为不存在ON CONFLICT DO NOTHING,并且如果第二个ON CONFLICT DO NOTHING中存在任何冲突,则整个操作将回滚.第二个INSERT.

Note, the UPDATE only depends on the 1st INSERT, success of the 2nd INSERT is guaranteed, since there is no ON CONFLICT DO NOTHING and the whole operation would be rolled back if there is any conflict in the 2nd INSERT.

相关:

有多种选择方式.将JSON数组传递给函数的想法就是其中之一.如果对象与目标表匹配,则可以使用

There are various options how. Your idea to pass a JSON array to a function is one of them. If objects match the target table, you can use json_populate_recordset() in a single INSERT query. Or just use the INSERT (as prepared statement) without function wrapper.

INSERT INTO target_tbl  -- it's ok to omit target columns here
SELECT *
FROM   json_populate_recordset(null::target_tbl,  -- use same table type
          json '[{ "id": "1", "val1": "1-val1", "val2": "1-val2" },
                 { "id": "2", "val1": "2-val1", "val2": "2-val2" },
                 { "id": "3", "val1": "3-val1", "val2": "3-val2" },
                 { "id": "4", "val1": "4-val1", "val2": "4-val2" }]');


对于少数几列,您还可以为每列传递一个数组,并并行地循环遍历它们.您可以通过对数组索引进行简单循环来完成此操作.从Postgres 9.4开始,还提供了方便的unnest(),它带有多个参数,可以在单个查询中完成所有操作:


For just a handful of columns you might also pass an array for each column and loop through them in parallel. You can do this with a simple loop on the array index. Since Postgres 9.4 there is also the convenient unnest() with multiple parameters to do it all in a single query:

最好的解决方案取决于您拥有的数据格式.

The best solution depends on the data format you have.

这篇关于使用JSON参数的Postgres批量INSERT函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆