SQL合并行 [英] SQL to merge rows
问题描述
如何从订购的表中选择行
How I can select the rows from a table that when ordered
- 第一个元素与某行匹配
- 第二个元素与下一行匹配
- 第三个元素在第二行之后的下一行
- 第四元素位于第三行之后的下一行
- 以此类推,直到数组中的值结束?
假设我将这些行作为查询的结果(表token
容纳id
和word
,表positioning
容纳id
和position
):
Supposing I have these rows as result of a query (table token
holds id
and word
and table positioning
holds id
and position
):
id | word | textblockid |sentence |position
5 | Fear | 5 | 1 | 1
8 | of | 5 | 1 | 2
6 | the | 5 | 1 | 3
7 | Dark | 5 | 1 | 4
9 | is | 5 | 1 | 5
我可以使用不同的textblockid,句子和位置在表格中传播此信息.
我要对此进行转换:
id | word | textblockid | sentence |position
10 | Fear of the Dark | 5 | 1 | 1
9 | is | 5 | 1 | 2
我正在执行一个函数,该函数接收要合并的id的数组,类似于merge_tokens('{5,8,6,7}')
.
I'm doing a function that receives an array with the ids to merge, something like merge_tokens('{5,8,6,7}')
.
我将新单词Fear of the Dark
插入表token
中,并获取生成的ID(例如,id
为10
).这很容易.
I insert the new word Fear of the Dark
in the table token
and get the generated id (as in example, id
is 10
). This is easy.
我需要将第一个单词(在本例中为Fear
)的id
更新为10
,并删除下一个单词(of
,the
,Dark
).
I need to update the id
of the first word (in this case, Fear
) to 10
and delete the next words(of
,the
,Dark
).
我的疑问是如何执行这些操作.我想我需要从有序表中SELECT
,其中第一行id匹配id数组中的第一个元素,第二行id匹配第二个元素id数组,依此类推,然后,更新第一个元素并删除接下来的.
My doubt is how I can perform these operations. I think I need to SELECT
from an ordered table where the first row id matches the first element in the id array, the second row id matches the second element id array and so on, and after this, update the first element and remove the next ones.
我不能删除ID来删除其他行,因为换句话说,它们已被使用.我只会删除前一个为Fear
,下一个为of
和下一个Dark
的of
.遵循此规则,我只能删除the
,其中前一个是of
,另一个前一个Fear
,下一个是Dark
.
I can't delete just delete the other rows by id because they are used in other words. I only will delete the of
where the previous is Fear
, the next is of
and the next Dark
. Following this rule, I only can delete the
where the previous is of
, the other previous Fear
and the next is Dark
.
例如,我可以在同一张表中放置一些不会受到影响的东西:
As example, I can have in the same table something like that can't be affected:
id | word | textblockid |sentence |position
6 | the | 8 | 3 | 10
11 | sound | 8 | 3 | 21
8 | of | 8 | 3 | 12
6 | the | 8 | 3 | 13
7 | mountain | 8 | 3 | 14
推荐答案
在回答了您最近的大部分问题后,我对您的工作情况还是一无所知.因此,我仔细研究了您的解决方案并进行了很多优化.通常,我简化了代码,但是也做了一些实质性的改进.
After answering most of your recent questions I have a vague idea of what you are doing. So I had a closer look at your solution and optimized quite a bit. Mostly I simplified the code, but there are some substantial improvements, too.
- 请勿在plpgsql中使用未记录的赋值运算符
=
.使用:=
代替.有关更多信息,请参见此相关问题. - 为什么
LOOP BEGIN
?单独的代码块仅在不需要时才会减慢速度.删除了. - 更多,我添加了一些评论
- Don't use the undocumented assignment operator
=
in plpgsql. Use:=
instead. See this related question for more info. - Why
LOOP BEGIN
? A separate code block only slows down if you don't need it. Removed it. - Many more, I added a few comments
请并行查看代码以获取一些提示.
测试这两个版本,看看哪个执行得更快.
Please have a look at the code side-by-side for some hints.
Test the two versions to see which performs faster.
供您考虑:
CREATE OR REPLACE FUNCTION merge_tokens(words varchar[], separator varchar)
RETURNS VOID AS
$body$
DECLARE
r record;
current_id integer;
ids integer[];
generated_word varchar := ''; -- you can initialize variables at declaration time. Saves additional assignment.
BEGIN
-- get the ids and generate the word
RAISE NOTICE 'Getting ids and generating words';
generated_word := array_to_string(words, separator); -- 1 assignment is much cheaper. Also: no trim() needed.
ids := ARRAY
( SELECT t.id
FROM (
SELECT row_number() OVER () AS rn, text
FROM (SELECT unnest(words) AS text) x) y
JOIN token t USING (text)
ORDER BY rn);
RAISE NOTICE 'Generated word: %', generated_word;
-- check if the don't exists to insert it
SELECT INTO current_id t.id FROM token t WHERE t.text = generated_word;
IF NOT FOUND THEN
RAISE NOTICE 'Word don''t exists';
INSERT INTO token(text) VALUES(generated_word)
RETURNING id
INTO current_id; --get the last value without additional query.
END IF;
RAISE NOTICE 'Word id: %', current_id;
-- select the records that will be updated
RAISE NOTICE 'Getting words to be updated.';
FOR r IN
SELECT textblockid, sentence, position, tokenid, rn
FROM
( -- select the rows that are complete
SELECT textblockid, sentence, position, tokenid, rn, count(*) OVER (PARTITION BY grp) AS counting
FROM
( -- match source with lookup table
SELECT source.textblockid, source.sentence, source.position, source.tokenid, source.rn, source.grp
FROM
( -- select textblocks where words appears with row number to matching
SELECT tb.textblockid, tb.sentence, tb.position, tb.tokenid, grp
,CASE WHEN grp > 0 THEN
row_number() OVER (PARTITION BY grp ORDER BY tb.textblockid, tb.sentence, tb.position)
END AS rn
FROM
( -- create the groups to be used in partition by to generate the row numbers
SELECT tb.textblockid, tb.sentence, tb.position, tb.tokenid
,SUM(CASE WHEN tb.tokenid = ids[1] THEN 1 ELSE 0 END) OVER (ORDER BY tb.textblockid, tb.sentence, tb.position) AS grp
FROM textblockhastoken tb
JOIN
( --select the textblocks where the word appears
SELECT textblockid, sentence
FROM textblockhastoken tb
WHERE tb.tokenid = ids[1]
) res USING (textblockid, sentence)
) tb
) source
-- create the lookup table to match positions
JOIN (SELECT row_number() OVER () as rn, id FROM unnest(ids) AS id) lookup USING (rn)
WHERE source.tokenid = lookup.id
) merged
) g
WHERE g.counting = array_length(ids,1)
ORDER BY g.rn --order by row number to update first, delete and change positions after
LOOP
--check if update or delete
IF (r.rn = 1) THEN
RAISE NOTICE 'Updating word in T:% S:% P:%', r.textblockid, r.sentence, r.position;
UPDATE textblockhastoken tb SET tokenid = current_id
WHERE (tb.textblockid, tb.sentence, tb.position)
= ( r.textblockid, r.sentence, r.position);
ELSE
RAISE NOTICE 'Deleting word in T:% S:% P:%', r.textblockid, r.sentence, r.position;
DELETE FROM textblockhastoken tb
WHERE (tb.textblockid, tb.sentence, tb.position)
= ( r.textblockid, r.sentence, r.position);
END IF;
--check if is the last word to update the positions
IF (r.rn = array_length(ids,1)) THEN
RAISE NOTICE 'Changing positions in T:% S:%', r.textblockid, r.sentence;
UPDATE textblockhastoken tb SET position = new_position
FROM
( SELECT textblockid, sentence, position
,row_number() OVER (PARTITION BY tb.textblockid, tb.sentence ORDER BY tb.position) as new_position
FROM textblockhastoken tb
WHERE tb.textblockid = r.textblockid AND tb.sentence = r.sentence
) np
WHERE (tb.textblockid, tb.sentence, tb.position)
= (np.textblockid, np.sentence, np.position)
AND tb.position <> np.new_position;
END IF;
END LOOP;
END;
$body$ LANGUAGE plpgsql;
这篇关于SQL合并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!