优化INSERT/UPDATE/DELETE操作 [英] Optimize INSERT / UPDATE / DELETE operation
问题描述
我想知道以下脚本是否可以某种方式进行优化.它确实在磁盘上写了很多东西,因为它可能删除了最新的行并重新插入它们.我当时正在考虑应用在重复键更新中插入..."之类的东西,并发现了单行更新的一些可能性,但我不知道如何在INSERT INTO ... SELECT query
的上下文中应用它.
I wonder if the following script can be optimized somehow. It does write a lot to disk because it deletes possibly up-to-date rows and reinserts them. I was thinking about applying something like "insert ... on duplicate key update" and found some possibilities for single-row updates but I don't know how to apply it in the context of INSERT INTO ... SELECT query
.
CREATE OR REPLACE FUNCTION update_member_search_index() RETURNS VOID AS $$
DECLARE
member_content_type_id INTEGER;
BEGIN
member_content_type_id :=
(SELECT id FROM django_content_type
WHERE app_label='web' AND model='member');
DELETE FROM watson_searchentry WHERE content_type_id = member_content_type_id;
INSERT INTO watson_searchentry (engine_slug, content_type_id, object_id
, object_id_int, title, description, content
, url, meta_encoded)
SELECT 'default',
member_content_type_id,
web_member.id,
web_member.id,
web_member.name,
'',
web_user.email||' '||web_member.normalized_name||' '||web_country.name,
'',
'{}'
FROM web_member
INNER JOIN web_user ON (web_member.user_id = web_user.id)
INNER JOIN web_country ON (web_member.country_id = web_country.id)
WHERE web_user.is_active=TRUE;
END;
$$ LANGUAGE plpgsql;
编辑:web_member
,watson_searchentry
,web_user
,web_country
的架构: http://pastebin.com/3tRVPPVi .
主要要点是更新watson_searchentry
中的列title
和content
.表上有一个触发器,可根据这些列设置列search_tsv
的值.
The main point is to update columns title
and content
in watson_searchentry
. There is a trigger on the table that sets value of column search_tsv
based on these columns.
(content_type_id, object_id_int)
在表中是唯一的对,但是atm索引不存在(没有用).
(content_type_id, object_id_int)
in watson_searchentry
is unique pair in the table but atm the index is not present (there is no use for it).
此脚本最多应每天运行一次,以完全重建搜索索引,并偶尔在导入一些数据之后运行.
This script should be run at most once a day for full rebuilds of search index and occasionally after importing some data.
推荐答案
修改后的表格定义
如果您确实需要将这些列设置为NOT NULL
,并且确实需要将字符串'default'
作为engine_slug
的默认值,我建议您引入列默认值:
Modified table definition
If you really need those columns to be NOT NULL
and you really need the string 'default'
as default for engine_slug
, I would advice to introduce column defaults:
COLUMN | TYPE | Modifiers
-----------------+-------------------------+---------------------
id | INTEGER | NOT NULL DEFAULT ...
engine_slug | CHARACTER VARYING(200) | NOT NULL DEFAULT 'default'
content_type_id | INTEGER | NOT NULL
object_id | text | NOT NULL
object_id_int | INTEGER |
title | CHARACTER VARYING(1000) | NOT NULL
description | text | NOT NULL DEFAULT ''
content | text | NOT NULL
url | CHARACTER VARYING(1000) | NOT NULL DEFAULT ''
meta_encoded | text | NOT NULL DEFAULT '{}'
search_tsv | tsvector | NOT NULL
...
DDL语句为:
ALTER TABLE watson_searchentry ALTER COLUMN engine_slug DEFAULT 'default';
等等.
那么您不必每次都手动插入这些值.
Then you don't have to insert those values manually every time.
还:object_id text NOT NULL, object_id_int INTEGER
?真奇怪我想你有你的理由...
Also: object_id text NOT NULL, object_id_int INTEGER
? That's odd. I guess you have your reasons ...
我将遵循您的更新要求:
I'll go with your updated requirement:
主要要点是更新
watson_searchentry
当然,您必须添加 UNIQUE 约束以实施您的要求:
Of course, you must add a UNIQUE constraint to enforce your requirements:
ALTER TABLE watson_searchentry
ADD CONSTRAINT ws_uni UNIQUE (content_type_id, object_id_int)
将使用附带的索引.通过查询初学者.
The accompanying index will be used. By this query for starters.
顺便说一句,我几乎从未在Postgres中使用varchar(n)
.只是text
. 这是一个原因.
BTW, I almost never use varchar(n)
in Postgres. Just text
. Here's one reason.
可以将其重写为具有数据修改通用表表达式(也称为可写" CTE)的单个SQL查询.需要Postgres 9.1或更高版本.
此外,此查询仅删除必须删除的内容,并更新可以更新的内容.
This could be rewritten as a single SQL query with data-modifying common table expressions, also called "writeable" CTEs. Requires Postgres 9.1 or later.
Additionally, this query only deletes what has to be deleted, and updates what can be updated.
WITH ctyp AS (
SELECT id AS content_type_id
FROM django_content_type
WHERE app_label = 'web'
AND model = 'member'
)
, sel AS (
SELECT ctyp.content_type_id
,m.id AS object_id_int
,m.id::text AS object_id -- explicit cast!
,m.name AS title
,concat_ws(' ', u.email,m.normalized_name,c.name) AS content
-- other columns have column default now.
FROM web_user u
JOIN web_member m ON m.user_id = u.id
JOIN web_country c ON c.id = m.country_id
CROSS JOIN ctyp
WHERE u.is_active
)
, del AS ( -- only if you want to del all other entries of same type
DELETE FROM watson_searchentry w
USING ctyp
WHERE w.content_type_id = ctyp.content_type_id
AND NOT EXISTS (
SELECT 1
FROM sel
WHERE sel.object_id_int = w.object_id_int
)
)
, up AS ( -- update existing rows
UPDATE watson_searchentry
SET object_id = s.object_id
,title = s.title
,content = s.content
FROM sel s
WHERE w.content_type_id = s.content_type_id
AND w.object_id_int = s.object_id_int
)
-- insert new rows
INSERT INTO watson_searchentry (
content_type_id, object_id_int, object_id, title, content)
SELECT sel.* -- safe to use, because col list is defined accordingly above
FROM sel
LEFT JOIN watson_searchentry w1 USING (content_type_id, object_id_int)
WHERE w1.content_type_id IS NULL;
-
django_content_type
上的子查询始终返回单个值?否则,CROSS JOIN
可能会引起麻烦.The subquery on
django_content_type
always returns a single value? Otherwise, theCROSS JOIN
might cause trouble.第一个CTE
sel
收集要插入的行.请注意,我如何选择匹配的列名来简化操作.The first CTE
sel
gathers the rows to be inserted. Note how I pick matching column names to simplify things.在CTE
del
中,我避免删除可以更新的行.In the CTE
del
I avoid deleting rows that can be updated.在CTE
up
中,这些行将被更新.In the CTE
up
those rows are updated instead.因此,我避免在最终的
INSERT
中插入之前未删除的行.Accordingly, I avoid inserting rows that were not deleted before in the final
INSERT
.可以轻松地包装到SQL或PL/pgSQL函数中以重复使用.
Can easily be wrapped into an SQL or PL/pgSQL function for repeated use.
对于大量同时使用不安全.比您拥有的功能要好得多,但仍不能100%健壮地抵抗并发写入.但是,根据您更新的信息,这不是问题.
Not secure for heavy concurrent use. Much better than the function you had, but still not 100% robust against concurrent writes. But that's not an issue according to your updated info.
用DELETE和INSERT替换UPDATE可能会或可能不会昂贵得多.在内部,由于 MVCC模型,无论如何,每个UPDATE都会产生新的行版本.
Replacing the UPDATEs with DELETE and INSERT may or may not be a lot more expensive. Internally every UPDATE results in a new row version anyways, due to the MVCC model.
如果您不太在意保留旧行,则更简单的方法可能会更快:删除所有内容并插入新行.同样,包装到plpgsql函数中可以节省一些计划开销.基本上,您的函数做了一些小的简化,并遵循上面添加的默认值:
If you don't really care about preserving old rows, your simpler approach may be faster: Delete everything and insert new rows. Also, wrapping into a plpgsql function saves a bit of planning overhead. Your function basically, with a couple of minor simplifications and observing the defaults added above:
CREATE OR REPLACE FUNCTION update_member_search_index() RETURNS VOID AS $func$ DECLARE _ctype_id int := ( SELECT id FROM django_content_type WHERE app_label='web' AND model = 'member' ); -- you can assign at declaration time. saves another statement BEGIN DELETE FROM watson_searchentry WHERE content_type_id = _ctype_id; INSERT INTO watson_searchentry (content_type_id, object_id, object_id_int, title, content) SELECT _ctype_id, m.id, m.id::int,m.name ,u.email || ' ' || m.normalized_name || ' ' || c.name FROM web_member m JOIN web_user u USING (user_id) JOIN web_country c ON c.id = m.country_id WHERE u.is_active; END $func$ LANGUAGE plpgsql;
我什至不使用
NULL
值并简化代码,但比简单的串联要慢一些.I even refrain from using
concat_ws()
: It is safe againstNULL
values and simplifies code, but a bit slower than simple concatenation.也:
该表上有一个触发器,用于设置列
search_tsv
的值 基于这些列.There is a trigger on the table that sets value of column
search_tsv
based on these columns.将逻辑合并到此功能中会更快-如果这是唯一需要触发器的时间.否则,可能不值得大惊小怪.
It would be faster to incorporate the logic into this function - if this is the only time the trigger is needed. Else, it's probably not worth the fuss.
这篇关于优化INSERT/UPDATE/DELETE操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!