拆分逗号分隔值并将它们映射到 SQLite 中的原始 ID [英] Split comma-separated values and map them to original ID in SQLite
问题描述
我有一个名为 articles
的表,其中包含以下格式的数据:
I have a table called articles
that contains data in the following format:
id|categories
--+----------
1|123,13,43
2|1,3,15
3|9,17,44,18,3
出于测试目的,您可以使用以下 SQL 命令创建此表:
For testing purposes, you may create this table using the following SQL commands:
CREATE TABLE articles(id INTEGER PRIMARY KEY, categories TEXT);
INSERT INTO articles VALUES(1, '123,13,43'), (2, '1,3,15'), (3, '9,17,44,18,3');
现在我想拆分 categories
列的值,以便得到如下所示的表:
Now I would like to split the values of the categories
column so that a table like the following one is the result:
id|category
--+--------
1|123
1|13
1|43
2|1
2|3
2|15
3|9
3|17
3|44
3|18
3|3
如您所见,我想将原始表带入第一范式.
As you can see, I would like to bring the original table into the First normal form.
从这个答案中,我已经知道如何以这种方式只拆分一行.以下代码示例仅取第二行(即 id=2 的行)并以所需方式拆分它们:
I already know how to split just one row in this way, from this answer. The following code example just takes the second row (i.e. the one where id=2) and splits them in the desired way:
WITH split(article_id, word, str, offsep) AS
(
VALUES
(
2,
'',
(SELECT categories FROM articles WHERE id=2),
1
)
UNION ALL
SELECT
article_id,
substr(str, 0, CASE WHEN instr(str, ',') THEN instr(str, ',') ELSE length(str)+1 END),
ltrim(substr(str, instr(str, ',')), ','),
instr(str, ',')
FROM split
WHERE offsep
) SELECT article_id, word FROM split WHERE word!='';
当然这是非常不灵活的,因为文章 ID 需要进行硬编码.所以,现在我的问题是:我必须在上面的 SQLite 代码中添加或更改什么才能使其对所有行进行操作并输出所需的结果?
Of course this is very unflexible, as the article ID needs to be hard-coded. So, now my question is: What do I have to add to or change in the upper SQLite code to make it operate on all rows and output the desired result?
推荐答案
经过一番折腾,我终于自己找到了解决方案.它还处理将 ''
或 NULL
作为 categories
值的行:
After some playing around, I finally figured out the solution myself. It also takes care of rows that have ''
or NULL
as values for categories
:
-- create temporary table which buffers the maximum article ID, because SELECT MAX can take a very long time on huge databases
DROP TABLE IF EXISTS max_article_id;
CREATE TEMP TABLE max_article_id(num INTEGER);
INSERT INTO max_article_id VALUES((SELECT MAX(id) FROM articles));
WITH RECURSIVE split(article_id, word, str, offsep) AS
(
VALUES ( 0, '', '', 0 ) -- begin with dummy article 0 (which does not actually exist) to avoid code duplication
UNION ALL
SELECT
CASE WHEN offsep==0 OR str IS NULL
THEN article_id+1 -- go to next article if the current one is finished
ELSE article_id -- and keep the current one in the opposite case
END,
CASE WHEN offsep==0 OR str IS NULL
THEN ''
ELSE substr(str, 0, CASE WHEN instr(str, ',') THEN instr(str, ',') ELSE length(str)+1 END)
END,
CASE WHEN offsep==0 OR str IS NULL -- when str==NULL, then there has been a NULL value for the categories cell of the current article
THEN (SELECT categories FROM articles WHERE id=article_id+1)
ELSE ltrim(substr(str, instr(str, ',')), ',')
END,
CASE WHEN offsep==0 OR str IS NULL -- offsep==0 means that the splitting was finished in the previous iteration
THEN 1 -- offsep==1 means that splitting the categories for a new article will begin in the next iteration
ELSE instr(str, ',') -- the actual string splitting stuff is explained and taken from here: http://stackoverflow.com/a/32051164
END
FROM split
WHERE article_id<=(SELECT * FROM max_article_id) -- stop getting new articles when the maximum article ID is reached
) SELECT article_id, word AS category FROM split WHERE word!=''; -- only select article_id and word from the result to use output the desired table layout
这篇关于拆分逗号分隔值并将它们映射到 SQLite 中的原始 ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!