通过 pl/sql 过程将带有行间隙和空值的逗号分隔值拆分为表中的列 [英] Split comma separated values with line gaps and nulls into columns in table via pl/sql procedure

查看:59
本文介绍了通过 pl/sql 过程将带有行间隙和空值的逗号分隔值拆分为表中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在表中有一个字符串,clob 值,我需要将其拆分为列.源表查询:

I have a string, clob value in table which i need to split into columns . Source table query:

Insert into disp_data(id,data) values(100,
'"Project title as per the outstanding Requirements","The values are not with respect to the requirement and analysis done by the team. 
Also it is difficult to prepare a scenario notwithstanding the fact it is difficult. This user story is going to be slightly complex however it is up to the team","Active","Disabled","25 tonnes of fuel","www.examplesites.com/html.asp&net;","","","","","25"');

在 clob 列值中也有空格、空值和行间距.所以当我尝试使用

In the clob column value there are spaces, null value and line gaps also. So when i try splitting it using

select regexp_substr(data,'[^,]+',1,level) from disp_data 
connect by regexp_substr(data,'[^,]+',1,level) is not null.

问题在于带有行间距的大文本,它将其拆分为不同的行.我曾想过使用上述结果集和数据透视表,但无法实现.

Problem is for the large text with line gaps, it is splitting it into different rows. I had thought of using the above result set and pivot but am unable to.

我需要将此数据作为列获取并推送到目标表-push_data_temp.

I need to get this data as columns and push in the destination table-push_data_temp.

select pid,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11 from push_data_temp;

clob 列有 11 个逗号分隔的值,需要将其作为列推送到此表中.这整个过程需要通过pl/sql程序来完成.

The clob column has 11 comma separated values that need to be pushed into this table as columns. This entire process needs to be done via pl/sql procedure.

push_data_temp 中的结果应如下所示.

The result in push_data_temp should look like this.

任何帮助将不胜感激.DB是oracle 19c

ANy help would be much appreciated. DB is oracle 19c

推荐答案

您的正则表达式 needs允许空值,即连续的逗号(但希望您在任何带引号的字符串中都没有逗号......).如果您有多个源行,那么使用递归 CTE 拆分会更容易:

Your regular expression needs to allow for nulls, i.e. consecutive commas (but hopefully you don't have commas within any of the quoted strings...). If you have multiple source rows then it's easier to split with a recursive CTE:

with rcte (id, data, lvl, result) as (
  select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
  from disp_data
  union all
  select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
  from rcte
  where lvl <= regexp_count(data, ',')
)
select id, lvl, result
from rcte
order by id, lvl;

然后您可以将结果转换为您想要的列:

You can then pivot the result into the columns you want:

with rcte (id, data, lvl, result) as (
  select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
  from disp_data
  union all
  select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
  from rcte
  where lvl <= regexp_count(data, ',')
)
select *
from (
  select id, lvl, result
  from rcte
)
pivot (max(result) as col for (lvl) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11));

db<>fiddle

你可以直接在插入语句中使用它:

And you can use that directly in an insert statement:

insert into push_data_temp (pid,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11)
with rcte (id, data, lvl, result) as (
  select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
  from disp_data
  union all
  select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
  from rcte
  where lvl <= regexp_count(data, ',')
)
select *
from (
  select id, lvl, result
  from rcte
)
pivot (max(result) as col for (lvl) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11));

db<>fiddle

不需要 PL/SQL,但如果您愿意,您仍然可以将其包装在一个过程中.

No PL/SQL needed, but you can still wrap it in a procedure if you want to.

我必须将其视为clob,并且将错误视为不一致的数据类型

I have to take as clob and it is throwing error as inconsistent datatype

您需要将令牌转换为 varchar2,这限制了它们的长度(4k 或 32k,取决于 Oracle 版本和设置):

You need to cast the tokens as varchar2, which limits their length (either 4k or 32k depending on Oracle version and settings):

with rcte (id, data, lvl, result) as (
  select id, data, 1,
    cast(regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1) as varchar2(4000))
  from disp_data
  union all
  select id, data, lvl + 1,
    cast(regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1) as varchar2(4000))
  from rcte
  where lvl <= regexp_count(data, ',')
)
...

db<>fiddle 带有 CLOB(并删除了连接示例,因为他们打破了它......)

db<>fiddle with CLOB (and connect-by examples removed, as they break it...)

当我尝试在文本之间使用逗号时,它会不均匀地拆分数据.

when i try for text with commas in between, it splits data unevenly.

这就是为什么我说希望您在任何带引号的字符串中都没有逗号".因为你没有任何真正的空元素 - 你有 ...",""... 而不是 ...,,... - 你可以跳过我认为的那些问题,并使用不同的模式:

That's why I said "hopefully you don't have commas within any of the quoted strings". As you don't have any really empty elements - you have ...","","... rather than ...,,... - you can skip the concern about those I suppose, and use a different pattern:

with rcte (id, data, lvl, result) as (
  select id, data, 1,
    cast(regexp_substr(data, '("[^"]*"|[^,]+)', 1, 1, null, 1) as varchar2(4000))
  from disp_data
  union all
  select id, data, lvl + 1,
    cast(regexp_substr(data, '("[^"]*"|[^,]+)', 1, lvl + 1, null, 1) as varchar2(4000))
  from rcte
  where lvl <= regexp_count(data, '("[^"]*"|[^,]+)')
)
...

db<>fiddle

如果您确实必须处理空元素,那么它仍然是可能的,但需要更多工作.这也不会处理没有字符串的转义双引号.在某些时候,用 PL/SQL 编写自己的解析器会更容易;甚至将数据写入磁盘,然后将其作为外部表读回,它可以为您处理所有这些.

If you did have to deal with null elements then it's still possible, but more work. This also won't deal with escaped double-quotes without strings. At some point it will be easier to write your own parser in PL/SQL; or even to write the data to disk and read it back in as an external table which can handle all of this for you.

这篇关于通过 pl/sql 过程将带有行间隙和空值的逗号分隔值拆分为表中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆