优化INSERT/UPDATE/DELETE操作 [英] Optimize INSERT / UPDATE / DELETE operation

查看:133
本文介绍了优化INSERT/UPDATE/DELETE操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道以下脚本是否可以某种方式进行优化.它确实在磁盘上写了很多东西,因为它可能删除了最新的行并重新插入它们.我当时正在考虑应用在重复键更新中插入..."之类的东西,并发现了单行更新的一些可能性,但我不知道如何在INSERT INTO ... SELECT query的上下文中应用它.

I wonder if the following script can be optimized somehow. It does write a lot to disk because it deletes possibly up-to-date rows and reinserts them. I was thinking about applying something like "insert ... on duplicate key update" and found some possibilities for single-row updates but I don't know how to apply it in the context of INSERT INTO ... SELECT query.

CREATE OR REPLACE FUNCTION update_member_search_index() RETURNS VOID AS $$
DECLARE
   member_content_type_id INTEGER;
BEGIN
   member_content_type_id :=
      (SELECT id FROM django_content_type
       WHERE app_label='web' AND model='member');

   DELETE FROM watson_searchentry WHERE content_type_id = member_content_type_id;

   INSERT INTO watson_searchentry (engine_slug, content_type_id, object_id
                                 , object_id_int, title, description, content
                                 , url, meta_encoded)
   SELECT 'default',
         member_content_type_id,
         web_member.id,
         web_member.id,
         web_member.name,
         '',
         web_user.email||' '||web_member.normalized_name||' '||web_country.name,
         '',
         '{}'
   FROM web_member
   INNER JOIN web_user ON (web_member.user_id = web_user.id)
   INNER JOIN web_country ON (web_member.country_id = web_country.id)
   WHERE web_user.is_active=TRUE;
END;
$$ LANGUAGE plpgsql;

编辑:web_memberwatson_searchentryweb_userweb_country的架构: http://pastebin.com/3tRVPPVi .

主要要点是更新watson_searchentry中的列titlecontent.表上有一个触发器,可根据这些列设置列search_tsv的值.

The main point is to update columns title and content in watson_searchentry. There is a trigger on the table that sets value of column search_tsv based on these columns.

(content_type_id, object_id_int)在表中是唯一的对,但是atm索引不存在(没有用).

(content_type_id, object_id_int) in watson_searchentry is unique pair in the table but atm the index is not present (there is no use for it).

此脚本最多应每天运行一次,以完全重建搜索索引,并偶尔在导入一些数据之后运行.

This script should be run at most once a day for full rebuilds of search index and occasionally after importing some data.

推荐答案

修改后的表格定义

如果您确实需要将这些列设置为NOT NULL,并且确实需要将字符串'default'作为engine_slug的默认值,我建议您引入列默认值:

Modified table definition

If you really need those columns to be NOT NULL and you really need the string 'default' as default for engine_slug, I would advice to introduce column defaults:

COLUMN           |          TYPE           |      Modifiers
-----------------+-------------------------+---------------------
 id              | INTEGER                 | NOT NULL DEFAULT ... 
 engine_slug     | CHARACTER VARYING(200)  | NOT NULL DEFAULT 'default'
 content_type_id | INTEGER                 | NOT NULL
 object_id       | text                    | NOT NULL
 object_id_int   | INTEGER                 |
 title           | CHARACTER VARYING(1000) | NOT NULL
 description     | text                    | NOT NULL DEFAULT ''
 content         | text                    | NOT NULL
 url             | CHARACTER VARYING(1000) | NOT NULL DEFAULT ''
 meta_encoded    | text                    | NOT NULL DEFAULT '{}'
 search_tsv      | tsvector                | NOT NULL
 ...

DDL语句为:

ALTER TABLE watson_searchentry ALTER COLUMN  engine_slug DEFAULT 'default';

等等.

那么您不必每次都手动插入这些值.

Then you don't have to insert those values manually every time.

还:object_id text NOT NULL, object_id_int INTEGER?真奇怪我想你有你的理由...

Also: object_id text NOT NULL, object_id_int INTEGER? That's odd. I guess you have your reasons ...

我将遵循您的更新要求:

I'll go with your updated requirement:

主要要点是更新watson_searchentry

当然,您必须添加 UNIQUE 约束以实施您的要求:

Of course, you must add a UNIQUE constraint to enforce your requirements:

ALTER TABLE watson_searchentry
ADD CONSTRAINT ws_uni UNIQUE (content_type_id, object_id_int)

将使用附带的索引.通过查询初学者.

The accompanying index will be used. By this query for starters.

顺便说一句,我几乎从未在Postgres中使用varchar(n).只是text. 这是一个原因.

BTW, I almost never use varchar(n) in Postgres. Just text. Here's one reason.

可以将其重写为具有数据修改通用表表达式(也称为可写" CTE)的单个SQL查询.需要Postgres 9.1或更高版本.
此外,此查询仅删除必须删除的内容,并更新可以更新的内容.

This could be rewritten as a single SQL query with data-modifying common table expressions, also called "writeable" CTEs. Requires Postgres 9.1 or later.
Additionally, this query only deletes what has to be deleted, and updates what can be updated.

WITH  ctyp AS (
   SELECT id AS content_type_id
   FROM   django_content_type
   WHERE  app_label = 'web'
   AND    model = 'member'
   )
, sel AS (
   SELECT ctyp.content_type_id
         ,m.id       AS object_id_int
         ,m.id::text AS object_id       -- explicit cast!
         ,m.name     AS title
         ,concat_ws(' ', u.email,m.normalized_name,c.name) AS content
         -- other columns have column default now.
   FROM   web_user    u
   JOIN   web_member  m  ON m.user_id = u.id
   JOIN   web_country c  ON c.id = m.country_id
   CROSS  JOIN ctyp
   WHERE  u.is_active
   )
, del AS (     -- only if you want to del all other entries of same type
   DELETE FROM watson_searchentry w
   USING  ctyp
   WHERE  w.content_type_id = ctyp.content_type_id
   AND    NOT EXISTS (
      SELECT 1
      FROM   sel
      WHERE  sel.object_id_int = w.object_id_int
      )
   )
, up AS (      -- update existing rows
   UPDATE watson_searchentry 
   SET    object_id = s.object_id
         ,title     = s.title
         ,content   = s.content
   FROM   sel s
   WHERE  w.content_type_id = s.content_type_id
   AND    w.object_id_int   = s.object_id_int
   )
               -- insert new rows
INSERT  INTO watson_searchentry (
        content_type_id, object_id_int, object_id, title, content)
SELECT  sel.*  -- safe to use, because col list is defined accordingly above
FROM    sel
LEFT    JOIN watson_searchentry w1 USING (content_type_id, object_id_int)
WHERE   w1.content_type_id IS NULL;

  • django_content_type上的子查询始终返回单个值?否则,CROSS JOIN可能会引起麻烦.

    • The subquery on django_content_type always returns a single value? Otherwise, the CROSS JOIN might cause trouble.

      第一个CTE sel收集要插入的行.请注意,我如何选择匹配的列名来简化操作.

      The first CTE sel gathers the rows to be inserted. Note how I pick matching column names to simplify things.

      在CTE del中,我避免删除可以更新的行.

      In the CTE del I avoid deleting rows that can be updated.

      在CTE up中,这些行将被更新.

      In the CTE up those rows are updated instead.

      因此,我避免在最终的INSERT中插入之前未删除的行.

      Accordingly, I avoid inserting rows that were not deleted before in the final INSERT.

      可以轻松地包装到SQL或PL/pgSQL函数中以重复使用.

      Can easily be wrapped into an SQL or PL/pgSQL function for repeated use.

      对于大量同时使用不安全.比您拥有的功能要好得多,但仍不能100%健壮地抵抗并发写入.但是,根据您更新的信息,这不是问题.

      Not secure for heavy concurrent use. Much better than the function you had, but still not 100% robust against concurrent writes. But that's not an issue according to your updated info.

      用DELETE和INSERT替换UPDATE可能会或可能不会昂贵得多.在内部,由于 MVCC模型,无论如何,每个UPDATE都会产生新的行版本.

      Replacing the UPDATEs with DELETE and INSERT may or may not be a lot more expensive. Internally every UPDATE results in a new row version anyways, due to the MVCC model.

      如果您不太在意保留旧行,则更简单的方法可能会更快:删除所有内容并插入新行.同样,包装到plpgsql函数中可以节省一些计划开销.基本上,您的函数做了一些小的简化,并遵循上面添加的默认值:

      If you don't really care about preserving old rows, your simpler approach may be faster: Delete everything and insert new rows. Also, wrapping into a plpgsql function saves a bit of planning overhead. Your function basically, with a couple of minor simplifications and observing the defaults added above:

      CREATE OR REPLACE FUNCTION update_member_search_index()
        RETURNS VOID AS
      $func$
      DECLARE
         _ctype_id int := (
            SELECT id
            FROM   django_content_type
            WHERE  app_label='web'
            AND    model = 'member'
            );  -- you can assign at declaration time. saves another statement
      BEGIN
         DELETE FROM watson_searchentry
         WHERE content_type_id = _ctype_id;
      
         INSERT INTO watson_searchentry
               (content_type_id, object_id, object_id_int, title, content)
         SELECT _ctype_id, m.id, m.id::int,m.name
               ,u.email || ' ' || m.normalized_name || ' ' || c.name
         FROM   web_member  m
         JOIN   web_user    u USING (user_id)
         JOIN   web_country c ON c.id = m.country_id
         WHERE  u.is_active;
      END
      $func$ LANGUAGE plpgsql;
      

      我什至不使用 :它可以安全地抵抗NULL值并简化代码,但比简单的串联要慢一些.

      I even refrain from using concat_ws(): It is safe against NULL values and simplifies code, but a bit slower than simple concatenation.

      也:

      该表上有一个触发器,用于设置列search_tsv的值 基于这些列.

      There is a trigger on the table that sets value of column search_tsv based on these columns.

      将逻辑合并到此功能中会更快-如果这是唯一需要触发器的时间.否则,可能不值得大惊小怪.

      It would be faster to incorporate the logic into this function - if this is the only time the trigger is needed. Else, it's probably not worth the fuss.

      这篇关于优化INSERT/UPDATE/DELETE操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆