注入安全的Postgres批量插入/更新。也许一个需要数组的函数? [英] Postgres bulk insert/update that's injection-safe. Perhaps a function that takes an array?

查看:128
本文介绍了注入安全的Postgres批量插入/更新。也许一个需要数组的函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本周我正在努力偿还一些技术债务,但令我震惊的是,我不知道如何使多值插入物免受意外或恶意SQL注入的侵害。我们正在使用Postgres 11.4。我有一个可供测试的工作台,其中包括一个约有26K行的小桌子,这是我正在测试的小桌子的声明:

I'm working on paying back some technical debt this week, and it hit me that I have no idea how to make multi-value inserts safe from accidental or malicious SQL injections. We're on Postgres 11.4. I've got a test bed to work from that includes a small table with about 26K rows, here's the declaration for a small table I'm using for testing:

BEGIN;

DROP TABLE IF EXISTS "data"."item" CASCADE;

CREATE TABLE IF NOT EXISTS "data"."item" (
    "id" uuid NOT NULL DEFAULT NULL,
    "marked_for_deletion" boolean NOT NULL DEFAULT false,
    "name_" citext NOT NULL DEFAULT NULL,

CONSTRAINT item_id_pkey
    PRIMARY KEY ("id")
);

CREATE INDEX item_marked_for_deletion_ix_bgin ON "data"."item" USING GIN("marked_for_deletion") WHERE marked_for_deletion = true;

ALTER TABLE "data"."item" OWNER TO "user_change_structure";
COMMIT;

我一直在使用多值插入向此表插入,还有许多其他插入

I've been inserting to this table, and many others, using multi-value inserts, along the lines of:

BEGIN;
INSERT 
   bundle up hundres or thousands of rows
  ON CONFLICT do what I need
COMMIT or ROLLBACK on the client side

工作正常。但是,如何使多值语句安全?那是我不知道的。这是我无法很好地说明问题的领域之一。我对骇客没有胃口,才干或耐心。我无法认为利用漏洞无济于事,我会以黑客的身份吸住。而且,就此而言,我通常更关心错误而不是邪恶的代码,因为我更经常遇到错误。

Works fine. But how do you make a multi-value statement safe? That's what I can't figure out. This is one of those areas where I can't reason about the problem well. I don't have the appetite, aptitude, or patience for hacking things. That I can't think up an exploit means nothing, I would suck as a hacker. And, for that matter, I'm generally more concerned about errors than evil in code, since I run into errors a whole lot more often.

我看到的标准建议为了安全插入,请使用准备好的语句。为INSERT准备的语句几乎是一个临时的运行时函数,用于在代码模板上进行插值。对我来说,编写一个实际的函数像这样更简单:

The standard advice I see for safe insertion is to use a prepared statement. A prepared statement for an INSERT is pretty much a temporary, runtime function for interpolation on a code template. For me, it's simpler to write an actual function, like this one:

DROP FUNCTION IF EXISTS data.item_insert_s (uuid, boolean, citext);

CREATE OR REPLACE FUNCTION data.item_insert_s (uuid, boolean, citext) 
  RETURNS int
AS $$
INSERT INTO item (
    id,
    marked_for_deletion,
    name_)

VALUES
    ($1,$2,$3)

ON CONFLICT(id) DO UPDATE SET 
    marked_for_deletion = EXCLUDED.marked_for_deletion,
    name_ = EXCLUDED.name_;

SELECT 1; -- No clue what to return, but you have to return something.

$$ LANGUAGE sql;

ALTER FUNCTION data.item_insert_s(uuid, boolean, citext) OWNER TO user_bender;

所有这些都可行,我已经尝试了一些计时测试。我截断表,执行多值插入,截断,执行一系列函数调用插入,然后看一下两者之间的区别。我尝试了多次运行,以不同的顺序进行操作,等等。这两种情况都以相同的方式使用BEGIN / COMMIT块,因此在两种测试中,我最终都会获得相同数量的交易。测试之间的结果差异要大于测试内部的差异,但是多值插入总是更快。

All of that works, and I've tried some timing tests. I truncate the table, do a multi-value insert, truncate, do a series of function call inserts, and see what the difference is. I've tried multiple runs, doing the operations in different orders, etc. Both cases use a BEGIN/COMMIT block in the same way, so I'll end up with the same number of transactions on either test. The results vary more across tests than within them, but the multi-value insert is always faster. Congratulations to me for confirming the obvious.

是否可以安全地进行批量插入和更新?在我看来,我可以编写一个接受一个或多个数组的函数,将其解析出来,然后在该函数内的循环中运行代码。我想测试一下,但对Postgres数组语法感到困惑。我环顾四周,听起来像是一个对象数组,而foreach循环可能正是我所追求的。我环顾四周,这是一个已经解决的主题,但是我没有找到一个简单的示例来说明如何准备要插入的数据以及如何解压缩数据。我怀疑我将无法使用SQL和普通的unnest(),因为1)我想保护输入的安全,并且2)我可能有一些函数无法将其表中的所有字段都接受输入。

Is there a way to safely do bulk inserts and updates? It occurred to me that I could write a function that takes an array or arrays, parse it out, and run the code in a loop within the function. I'd like to test that out, but get flummoxed by the Postgres array syntax. I've looked around, and it sounds like an array of objects and a foreach loop might be just what I'm after. I've looked around, and this is a topic that has been addressed, but I haven't found a straightforward example of how to prepare data for insertion, and the the unpacking of it. I'm suspecting that I won't be able to use SQL and a plain unnest() because 1) I want to safe the inputs and 2) I might have functions that don't take all of the fields in a table in their input.

为了使事情变得简单,我可以使用固定参数列表的函数和固定格式的数组输入。我将为各种表编写代码生成器,因此不需要使Postgres端的代码变得不必要的复杂。

To make things a bit easier, I'm fine with functions with fixed parameter lists, and array inputs with fixed formats. I'll write code generators for my various tables, so I don't need to make the Postgres-side code any more complex than necessary.

感谢您的帮助!

注意:我收到一条消息,说明为什么这个问题与我的较新的相关问题不同:

Note: I got a message to explain why this question is different than my newer, related question:

改进基于UPSERTs的功能输入数组

答案:是的,这是相同的起点。在这个问题中,我问的是SQL注入,在第二个问题中,我试图关注数组输入解决方案。我不太确定何时拆分新问题,何时将问题分解为多部分。

Answer: Yes, it's the same starting point. In this question, I was asking about SQL injection, in the second question I was trying to focus on the array-input solution. I'm not quite sure when to split out new questions, and when to let questions turn into multi-part threads.

推荐答案

今天早上是在新南威尔士州的远南海岸,我想我会对此再做一遍。我之前应该提到,我们的部署环境是RDS,这使得COPY的吸引力降低。但是,传入每个元素包括行数据的数组的想法很诱人。它非常类似于多值INSERT,但是语法不同。我对Postgres中的数组有些挑剔,但总是对语法感到迷惑。我发现了一些非常出色的线程,其中有一些来自一些顶尖海报的细节需要研究:

It's morning here on the Far South Coast of NSW, and I figured I'd take another crack at this. I should have mentioned before that our deployment environment is RDS, which makes COPY less appealing. But the idea of passing in an array where each element includes the row data is very appealing. It's much like a multi-value INSERT, but with different syntactic sugar. I've poked at arrays in Postgres a bit, and always come away befuddled by the syntax. I found a few really excellent threads with lots of details from some top posters to study:

> https://dba.stackexchange.com/questions/224785/pass-array-of-mixed-type-in​​to-stored函数

https://dba.stackexchange.com/questions/131505/use-array-of-composite-type-as-function-parameter-and-access-it

https://dba.stackexchange.com/questions/225176/how-to-pass-an-array-to-a-plpgsql-function-with-variadic-parameter/

从那里,我有一个有效的测试功能:

From there, I've got a working test function:

DROP FUNCTION IF EXISTS data.item_insert_array (item[]);

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item[]) 
  RETURNS int
AS $$
INSERT INTO item (
    id, 
    marked_for_deletion, 
    name_)

SELECT
    d.id, 
    d.marked_for_deletion,
    d.name_

FROM unnest(data_in) d

ON CONFLICT(id) DO UPDATE SET 
    marked_for_deletion = EXCLUDED.marked_for_deletion,
    name_ = EXCLUDED.name_;

SELECT cardinality(data_in); -- array_length() doesn't work. ¯\_(ツ)_/¯

$$ LANGUAGE sql;

ALTER FUNCTION data.item_insert_array(item[]) OWNER TO user_bender;

为闭合圆圈,下面是一些输入示例:

To close the circle, here's an example of some input:

select * from item_insert_array(

    array[
        ('2f888809-2777-524b-abb7-13df413440f5',true,'Salad fork'),
        ('f2924dda-8e63-264b-be55-2f366d9c3caa',false,'Melon baller'),
        ('d9ecd18d-34fd-5548-90ea-0183a72de849',true,'Fondue fork')
        ]::item[]
    );

回到我的测试结果,它的表现与我最初的多值插入大致相同。我最初发布的其他两种方法要慢4倍。 (结果很不稳定,但是总是慢很多。)但是我仍然有我最初的问题:

Going back to my test results, this performs roughly as well as my original multi-value insert. The other two methods I posted originally are, let's say, 4x slower. (The results are pretty erratic, but they're always a lot slower.) But I'm still left with my original question:

这种注射安全吗? ?

如果没有,我想我需要使用FOREACH循环并执行...使用...或FORMAT格式将其重写为PL / pgSQL -在那里清理文本处理/插值功能。有人知道吗?

If not, I guess I need to rewrite it in PL/pgSQL with a FOREACH loop and EXECUTE...USING or FORMAT to get the injection-cleaning text processing/interpolcation features there. Does anyone know?

关于此功能,我还有很多其他问题(应该是一个程序,以便我可以管理交易吗?如何使输入变成任何数组? ?返回的明智结果是什么?)但是我认为我将不得不将这些作为自己的问题。

I have a lot of other questions about this function (Should it be a procedure so that I can manage the transaction? How do I make the input anyarray? What would be a sensible result to return?) But I think I'll have to pursue those as their own questions.

感谢您的帮助!

这篇关于注入安全的Postgres批量插入/更新。也许一个需要数组的函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆