将逗号分隔的值拆分为具有固定列数的目标表 [英] Split comma separated values into target table with fixed number of columns

查看:70
本文介绍了将逗号分隔的值拆分为具有固定列数的目标表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Postgres 13.1 数据库中有一个只有一列的表.它由许多行以逗号分隔的值组成 - 最多大约 20 个元素.

I have a table with a single column in a Postgres 13.1 database. It consists of many rows with comma-separated values - around 20 elements at most.

我想将数据拆分为多列.但是我在一行中只有有限数量的列(例如 5 个和 5 个以上的 CSV 值,因此必须将多余的值转移到新行/下一行).如何做到这一点?

I want to split the data into multiple columns. But I have only a limited number of columns say 5 and more than 5 CSV values in a single row, so excess values must be shifted to new/next row). How to do this?

示例:

a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
'
'
'

列只有 5,所以输出会是这样的:

Columns are only 5, so the output would be like:

c1 c2 c3 c4 c5
---------------
a1 b1 c1
a2 b2 c2 d2 e2 
f2
a3 b3 c3 d3 e3
f3 g3 h3 i3 j3
a4
a5 b5 c5
'
'
'

推荐答案

将 CSV 值存储在单列中通常是一种糟糕的设计.如果可能,请改用数组或适当标准化的设计.

It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.

虽然坚持你目前的情况......

While stuck with your current situation ...

一个没有技巧或递归的简单解决方案就可以:

A simple solution without trickery or recursion will do:

SELECT id, 1 AS rnk
     , split_part(csv, ', ', 1) AS c1
     , split_part(csv, ', ', 2) AS c2
     , split_part(csv, ', ', 3) AS c3
     , split_part(csv, ', ', 4) AS c4
     , split_part(csv, ', ', 5) AS c5
FROM   tbl
WHERE  split_part(csv, ', ', 1) <> '' -- skip empty rows

UNION ALL
SELECT id, 2
     , split_part(csv, ', ', 6)
     , split_part(csv, ', ', 7)
     , split_part(csv, ', ', 8)
     , split_part(csv, ', ', 9)
     , split_part(csv, ', ', 10)
FROM   tbl
WHERE  split_part(csv, ', ', 6) <> '' -- skip empty rows

-- three more blocks to cover a maximum "around 20"

ORDER  BY id, rnk;

db<>fiddle 这里

id 为原表的PK.
这显然假定 ', ' 作为分隔符.
您可以轻松适应.

id being the PK of the original table.
This assumes ', ' as separator, obviously.
You can adapt easily.

相关:

各种方式.一种方法是使用 regexp_replace()在取消嵌套之前每隔五个分隔符替换...

Various ways. One way use regexp_replace() to replace every fifth separator before unnesting ...

-- for any number of elements
SELECT t.id, c.rnk
     , split_part(c.csv5, ', ', 1) AS c1
     , split_part(c.csv5, ', ', 2) AS c2
     , split_part(c.csv5, ', ', 3) AS c3
     , split_part(c.csv5, ', ', 4) AS c4
     , split_part(c.csv5, ', ', 5) AS c5
FROM   tbl t
     , unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER  BY t.id, c.rnk;

db<>fiddle 这里

这假定所选的分隔符 ; 从不 出现在您的字符串中.(就像一样,永远不会出现.)

This assumes that the chosen separator ; never appears in your strings. (Just like , can never appear.)

正则表达式模式是关键:'((?:.*?,){4}.*?),'

The regular expression pattern is the key: '((?:.*?,){4}.*?),'

(?:) ... 非捕获"括号组
() ... 捕获"一组括号
*? ... 非贪婪量词
{4}? ... 恰好 4 个匹配的序列

(?:) ... "non-capturing" set of parentheses
() ... "capturing" set of parentheses
*? ... non-greedy quantifier
{4}? ... sequence of exactly 4 matches

替换 '\1;' 包含 反向引用 \1.

'g' 作为第四个函数参数需要重复替换.

'g' as fourth function parameter is required for repeated replacement.

进一步阅读:

解决此问题的其他方法包括递归 CTE 或集合返回函数...

Other ways to solve this include a recursive CTE or a set-returning function ...

(就像您在如何将值从右侧开始放入列中?)
简单地倒数数字,例如:

(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:

SELECT t.id, c.rnk
     , split_part(c.csv5, ', ', 5) AS c1
     , split_part(c.csv5, ', ', 4) AS c2
     , split_part(c.csv5, ', ', 3) AS c3
     , split_part(c.csv5, ', ', 2) AS c4
     , split_part(c.csv5, ', ', 1) AS c5
FROM ...

db<>fiddle 这里

这篇关于将逗号分隔的值拆分为具有固定列数的目标表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆