改进基于输入数组的UPSERT的功能 [英] Improving a function that UPSERTs based on an input array

查看：100 发布时间：2020/5/29 21:40:38 arrays postgresql function sql-injection

本文介绍了改进基于输入数组的UPSERT的功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望能得到一些帮助，以改进用于UPSERTing作为数组传递的行的方法。我使用的是Postgres 11.4，并且已在RDS上进行了部署。我有很多表格可以整理，但是从一个简单的表格开始进行实验：

I am hoping to get some help improving a method for UPSERTing rows passed in as an array. I'm on Postgres 11.4 with deployment on RDS. I'm got a lot of tables to sort out, but am starting with a simple table for experimentation:

BEGIN;
DROP TABLE IF EXISTS "data"."item" CASCADE;

CREATE TABLE IF NOT EXISTS "data"."item" (
    "id" uuid NOT NULL DEFAULT NULL,
    "marked_for_deletion" boolean NOT NULL DEFAULT false,
    "name_" citext NOT NULL DEFAULT NULL,

CONSTRAINT item_id_pkey
    PRIMARY KEY ("id")
);

CREATE INDEX item_marked_for_deletion_ix_bgin ON "data"."item" USING GIN("marked_for_deletion") WHERE marked_for_deletion = true;

ALTER TABLE "data"."item" OWNER TO "user_change_structure";
COMMIT;

到目前为止，该函数看起来像这样：

The function, so far, looks like this:

DROP FUNCTION IF EXISTS data.item_insert_array (item[]);

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item[]) 
  RETURNS int
AS $$
INSERT INTO item (
    id, 
    marked_for_deletion, 
    name_)

SELECT
    d.id, 
    d.marked_for_deletion,
    d.name_

FROM unnest(data_in) d

ON CONFLICT(id) DO UPDATE SET 
    marked_for_deletion = EXCLUDED.marked_for_deletion,
    name_ = EXCLUDED.name_;

SELECT cardinality(data_in); -- array_length() doesn't work. ¯\_(ツ)_/¯

$$ LANGUAGE sql;

ALTER FUNCTION data.item_insert_array(item[]) OWNER TO user_bender;

呼叫看起来像这样：

select * from item_insert_array(

    array[
        ('2f888809-2777-524b-abb7-13df413440f5',true,'Salad fork'),
        ('f2924dda-8e63-264b-be55-2f366d9c3caa',false,'Melon baller'),
        ('d9ecd18d-34fd-5548-90ea-0183a72de849',true,'Fondue fork')
        ]::item[]
    );

我正在尝试开发一种用于UPSERT的系统，该系统注射安全且性能良好。我将替换一个更幼稚的多值插入，其中INSERT完全在客户端组成。意思是，我不确定在连接文本时不会引入缺陷。（我在这里问过这个问题： Postgres的批量插入/更新操作很安全，也许是一个需要数组的函数？）

I'm trying to develop a system for UPSERT that is injection-safe and that performs well. I'll be replacing a more naive multi-value insert where the INSERT is composed completely on the client side. Meaning, I can't be certain that I'm not introducing defects when concatenating the text. (I asked about this here: Postgres bulk insert/update that's injection-safe. Perhaps a function that takes an array?)

我已经获得了帮助各种出色的答案：

I've gotten this far with the help of various excellent answers:

https://dba.stackexchange.com/questions/224785/pass-array-of-mixed-type-into-storedfunction

https：// dba.stackexchange.com/questions/131505/use-array-of-composite-type-as-function-parameter-and-access-it

https：// dba .stackexchange.co m / questions / 225176 /如何将具有可变参数的数组传递给plpgsql函数/

I我并没有尝试所有这些版本中最复杂的版本，例如，我 fine 每个表都有一个函数，而 fine 每个数组元素都具有相同的格式。整理好基本模式后，我将编写代码生成器来构建所需的一切。因此，我认为我不需要VARIADIC参数列表，多态元素或将所有内容打包为JSON。（尽管我有时会需要插入JSON，但这只是数据。）

I'm not trying for the most complex version of all of this, for instance, I am fine with a single function per table, and fine that every array element has exactly the same format. I'll write code generators to build out everything I need, once I've got the basic pattern sorted out. So, I don't think that I need VARIADIC parameter lists, polymorphic elements, or everything-packaged-as-JSON. (Although I will need to insert JSON from time to time, that's just data.)

对于某些问题，我仍然可以使用一些补救措施：

I can still use some remedial help with some questions:

上面的代码是注入安全的，还是我需要在PL / pgSQL中重写它以便在执行时使用像FOREACH之类的东西...使用还是FORMAT或quote_literal等？

Is the code above injection-safe, or do I need to rewrite it in PL/pgSQL to use something like FOREACH with an EXECUTE...USING or FORMAT or quote_literal, etc.?

我正在将输入数组设置为item []。可以在这个小表格的所有字段中进行传递，但是我并不一定总是要在所有列中进行传递。我以为可以将anyarray用作函数中的类型，但是我不知道在这种情况下如何传递数组。有通用的东西数组类型吗？我可以为这些函数中的每一个创建自定义类型，但我不愿意。主要是因为我只会在那种情况下使用类型。

I'm setting the input array to item[]. That's fine as I'm passing in all of the fields for this tiny table, but I won't always want to pass in all columns. I thought that I could use anyarray as the type within the function, but I can't figure out how to pass in an array in that scenario. Is there a generic array-of-stuff type? I can create custom types for each of these functions, but I'd rather not. Mainly, because I would only use the type in that one situation.

将其实现为过程而不是函数似乎很有意义，因此我可以在函数中处理事务。我是否以此为依据？

It seems like it would make sense to implement this as a procedure rather than a function so that I can handle the transaction within the function. Am I off base on that?

任何关于返回值的样式（或其他方式）吗？我现在要返回一个计数，这至少很有用。

Any stylistic (or otherwise) on what to return? I'm returning a count now, which is at least a little useful.

我在滑雪板上这里有点，所以任何一般性意见将不胜感激。为了清楚起见，我所追求的是一种安全地插入多行并具有良好性能的模式，理想情况下，它不涉及每个函数或COPY的自定义类型。

I'm out over my skis a bit here, so any general comments will be much appreciated. For clarity, what I'm after is a schema for inserting multiple rows safely and with decent performance that, ideally, doesn't involve a custom type per function or COPY.

谢谢！

推荐答案

我们有很多不同的服务器推送到Postgres的中央表，这又增加了另一个麻烦。如果我在表中添加一列怎么办？

We've got a lot of different servers pushing up to central tables in Postgres, which adds another wrinkle. What if I add a column to my table:

ALTER TABLE item ADD COLUMN category citext;

现在表中有四列而不是三列。

Now the table has four columns instead of three.

我所有现有的推入立即都中断了，因为现在输入中缺少一列。我们有0％的机会可以同时更新所有服务器，因此这是没有选择的。

All of my existing pushes immediately break because now there's a column missing from the inputs. There is a 0% chance that we can update all of the server simultaneously, so that's no an option.

一种解决方案是为表格的每个版本创建自定义类型：

One solution is to create a custom type for each version of the table:

CREATE TYPE item_v1 AS (
    id uuid,
    marked_for_deletion boolean,
    name_ citext);

CREATE TYPE item_v2 AS (
    id uuid,
    marked_for_deletion boolean,
    name_ citext,
    category citext);

然后是每种类型的函数：

And then a function for each type:

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item_v1[]) 
etc.

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item_v2[]) 
etc.

我想您可能有一个采用任何数组并使用的巨大方法一个CASE，以整理要运行的代码。由于某些原因，我不会这样做，但我想您可以。（我已经看到这种方法真的很着急地用一种以上的语言变成了坏疽。）

I guess you could have a single ginormous method that takes anyarray and uses a CASE to sort out what code to run. I wouldn't do that for a few reasons, but I suppose you could. (I've seen that approach turn gangrenous in more than one language in a real hurry.)

所有这些似乎都是一件相当的工作。我缺少一种更简单的技术吗？我想您可以提交结构化的文本/ XML / JSON，将其解压缩并从那里开始工作。但是我不会不将其归档为简单。

All of that seems like a fair bit of work. Is there a simpler technique I'm missing? I'm imagining that you could submit structured text/XML/JSON, unpack it and work from there. But I would not file that under "simpler."

很明显，我仍在这里进行设计。我已经编写了足够的代码来测试我所显示的内容，但是想要在返回并在数十个表上实现该功能之前对其进行整理。

I'm still working through the design here, obviously. I've written up enough code to test out what I've shown, but want to sort out the details before going back and implementing this on dozens of tables.

谢谢寻求帮助。

这篇关于改进基于输入数组的UPSERT的功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

改进基于输入数组的UPSERT的功能 [英] Improving a function that UPSERTs based on an input array

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

改进基于输入数组的UPSERT的功能 [英] Improving a function that UPSERTs based on an input array

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭