SQL合并行 [英] SQL to merge rows

查看:74
本文介绍了SQL合并行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从订购的表中选择行

How I can select the rows from a table that when ordered

  • 第一个元素与某行匹配
  • 第二个元素与下一行匹配
  • 第三个元素在第二行之后的下一行
  • 第四元素位于第三行之后的下一行
  • 以此类推,直到数组中的值结束?

假设我将这些行作为查询的结果(表token容纳idword,表positioning容纳idposition):

Supposing I have these rows as result of a query (table token holds id and word and table positioning holds id and position):

 id | word | textblockid |sentence |position 
 5  | Fear |      5      |    1    |    1
 8  | of   |      5      |    1    |    2
 6  | the  |      5      |    1    |    3
 7  | Dark |      5      |    1    |    4
 9  | is   |      5      |    1    |    5

我可以使用不同的textblockid,句子和位置在表格中传播此信息.

我要对此进行转换:

 id  | word             | textblockid | sentence |position 
 10  | Fear of the Dark |      5      |     1    |    1
  9  | is               |      5      |     1    |    2

我正在执行一个函数,该函数接收要合并的id的数组,类似于merge_tokens('{5,8,6,7}').

I'm doing a function that receives an array with the ids to merge, something like merge_tokens('{5,8,6,7}').

我将新单词Fear of the Dark插入表token中,并获取生成的ID(例如,id10).这很容易.

I insert the new word Fear of the Dark in the table token and get the generated id (as in example, id is 10). This is easy.

我需要将第一个单词(在本例中为Fear)的id更新为10,并删除下一个单词(oftheDark).

I need to update the id of the first word (in this case, Fear) to 10 and delete the next words(of,the,Dark).

我的疑问是如何执行这些操作.我想我需要从有序表中SELECT,其中第一行id匹配id数组中的第一个元素,第二行id匹配第二个元素id数组,依此类推,然后,更新第一个元素并删除接下来的.

My doubt is how I can perform these operations. I think I need to SELECT from an ordered table where the first row id matches the first element in the id array, the second row id matches the second element id array and so on, and after this, update the first element and remove the next ones.

我不能删除ID来删除其他行,因为换句话说,它们已被使用.我只会删除前一个为Fear,下一个为of和下一个Darkof.遵循此规则,我只能删除the,其中前一个是of,另一个前一个Fear,下一个是Dark.

I can't delete just delete the other rows by id because they are used in other words. I only will delete the of where the previous is Fear, the next is of and the next Dark. Following this rule, I only can delete the where the previous is of, the other previous Fear and the next is Dark.

例如,我可以在同一张表中放置一些不会受到影响的东西:

As example, I can have in the same table something like that can't be affected:

 id  | word      | textblockid |sentence |position 
 6   | the       |      8      |    3    |    10
 11  | sound     |      8      |    3    |    21
 8   | of        |      8      |    3    |    12
 6   | the       |      8      |    3    |    13
 7   | mountain  |      8      |    3    |    14

推荐答案

在回答了您最近的大部分问题后,我对您的工作情况还是一无所知.因此,我仔细研究了您的解决方案并进行了很多优化.通常,我简化了代码,但是也做了一些实质性的改进.

After answering most of your recent questions I have a vague idea of what you are doing. So I had a closer look at your solution and optimized quite a bit. Mostly I simplified the code, but there are some substantial improvements, too.

  • 请勿在plpgsql中使用未记录的赋值运算符=.使用:=代替.有关更多信息,请参见此相关问题.
  • 为什么LOOP BEGIN?单独的代码块仅在不需要时才会减慢速度.删除了.
  • 更多,我添加了一些评论
  • Don't use the undocumented assignment operator = in plpgsql. Use := instead. See this related question for more info.
  • Why LOOP BEGIN? A separate code block only slows down if you don't need it. Removed it.
  • Many more, I added a few comments

请并行查看代码以获取一些提示.
测试这两个版本,看看哪个执行得更快.

Please have a look at the code side-by-side for some hints.
Test the two versions to see which performs faster.

供您考虑:

CREATE OR REPLACE FUNCTION merge_tokens(words varchar[], separator varchar)
  RETURNS VOID AS
$body$
DECLARE         
    r              record;
    current_id     integer;
    ids            integer[];
    generated_word varchar :=  '';  -- you can initialize variables at declaration time. Saves additional assignment.

BEGIN
    -- get the ids and generate the word
    RAISE NOTICE 'Getting ids and generating words';
    generated_word := array_to_string(words, separator);  -- 1 assignment is much cheaper. Also: no trim() needed.
    ids := ARRAY
    (  SELECT t.id
       FROM  (
          SELECT row_number() OVER () AS rn, text
          FROM  (SELECT unnest(words) AS text) x) y
          JOIN   token t USING (text)
       ORDER  BY rn);
    RAISE NOTICE 'Generated word: %', generated_word;

    -- check if the don't exists to insert it
    SELECT INTO current_id  t.id FROM token t WHERE t.text = generated_word; 
    IF NOT FOUND THEN
        RAISE NOTICE 'Word don''t exists';
        INSERT INTO token(text) VALUES(generated_word)
        RETURNING id
        INTO current_id;  --get the last value without additional query.
    END IF;
    RAISE NOTICE 'Word id: %', current_id;

    -- select the records that will be updated
    RAISE NOTICE 'Getting words to be updated.';
    FOR r IN
        SELECT textblockid, sentence, position, tokenid, rn
        FROM
        ( -- select the rows that are complete
          SELECT textblockid, sentence, position, tokenid, rn, count(*) OVER (PARTITION BY grp) AS counting
          FROM
          ( -- match source with lookup table
                SELECT source.textblockid, source.sentence, source.position, source.tokenid, source.rn, source.grp
                FROM
                (   -- select textblocks where words appears with row number to matching
                     SELECT tb.textblockid, tb.sentence, tb.position, tb.tokenid, grp
                                           ,CASE WHEN grp > 0 THEN
                                            row_number() OVER (PARTITION BY grp ORDER BY tb.textblockid, tb.sentence, tb.position)
                                            END AS rn               
                     FROM
                     (   -- create the groups to be used in partition by to generate the row numbers
                          SELECT tb.textblockid, tb.sentence, tb.position, tb.tokenid
                                ,SUM(CASE WHEN tb.tokenid = ids[1] THEN 1 ELSE 0 END) OVER (ORDER BY tb.textblockid, tb.sentence, tb.position) AS grp
                          FROM  textblockhastoken tb
                          JOIN
                          (   --select the textblocks where the word appears
                                SELECT textblockid, sentence
                                FROM   textblockhastoken tb
                                WHERE  tb.tokenid = ids[1]
                          ) res USING (textblockid, sentence)
                     ) tb
                ) source
                -- create the lookup table to match positions
                JOIN (SELECT row_number() OVER () as rn, id FROM unnest(ids) AS id) lookup USING (rn)
                WHERE source.tokenid = lookup.id
          ) merged
        ) g  
        WHERE g.counting = array_length(ids,1)
        ORDER BY g.rn --order by row number to update first, delete and change positions after
    LOOP
        --check if update or delete
        IF (r.rn = 1) THEN
            RAISE NOTICE 'Updating word in T:% S:% P:%', r.textblockid, r.sentence, r.position;
            UPDATE textblockhastoken tb SET tokenid = current_id
            WHERE (tb.textblockid, tb.sentence, tb.position)
                = ( r.textblockid,  r.sentence,  r.position);
        ELSE
            RAISE NOTICE 'Deleting word in T:% S:% P:%', r.textblockid, r.sentence, r.position;
            DELETE FROM textblockhastoken tb
            WHERE (tb.textblockid, tb.sentence, tb.position)
                = ( r.textblockid,  r.sentence,  r.position);
        END IF;
        --check if is the last word to update the positions
        IF (r.rn = array_length(ids,1)) THEN
            RAISE NOTICE 'Changing positions in T:% S:%', r.textblockid, r.sentence;
            UPDATE textblockhastoken tb SET position = new_position
            FROM
            (   SELECT textblockid, sentence, position
                      ,row_number() OVER (PARTITION BY tb.textblockid, tb.sentence ORDER BY tb.position) as new_position
                FROM   textblockhastoken tb
                WHERE  tb.textblockid = r.textblockid AND tb.sentence = r.sentence
            ) np
            WHERE (tb.textblockid, tb.sentence, tb.position)
                = (np.textblockid, np.sentence, np.position)
            AND    tb.position <> np.new_position;
        END IF;
    END LOOP;
END;
$body$ LANGUAGE plpgsql;

这篇关于SQL合并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆