在Informatica中将平面文件字符串拆分为多列 [英] Split Flat File String into multiple columns in Informatica

查看:201
本文介绍了在Informatica中将平面文件字符串拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个平面文件(csv),其中有1列(ColumnA),其数据格式为==> 100-000500-10-40000-000-200-600-000

I have a flat file (csv) with 1 column (ColumnA) having data in the format ==> 100-000500-10-40000-000-200-600-000

我需要将其拆分为8列(与连字符相邻的值),并串联2,4和5拆分后的子值.如何在表达式转换中做到这一点?我尝试了以下操作,但没有帮助,表格正在将这些字段加载为空白.

I need it split into 8 columns (values adjacent to the hyphens) and concatenate the 2,4 and 5 splitted sub values. How do I do this in an expression transformation? I have tried the following but didnt help, table is loading these fields as blank.

可变端口1:SUBSTR(ColumnA,3,6) 可变端口2:SUBSTR(ColumnA,12,5) 可变端口3:SUBSTR(ColumnA,18,3)

Variable port 1: SUBSTR(ColumnA,3,6) Variable port 2: SUBSTR(ColumnA,12,5) Variable port 3: SUBSTR(ColumnA,18,3)

然后,我分别设置了其中三个的输出端口V_ACCOUNT_SEG2_CODE,V_ACCOUNT_SEG4_CODE,V_ACCOUNT_SEG5_CODE,然后按如下所示进行连接:

Then I have made output ports of each 3 of them V_ACCOUNT_SEG2_CODE, V_ACCOUNT_SEG4_CODE, V_ACCOUNT_SEG5_CODE and then concatenated like below:

最终变量端口:RTRIM(V_ACCOUNT_SEG2_CODE ||'-'|| V_ACCOUNT_SEG4_CODE ||'-'|| V_ACCOUNT_SEG5_CODE,'-').

Final Variable Port: RTRIM(V_ACCOUNT_SEG2_CODE||'-'||V_ACCOUNT_SEG4_CODE||'-'||V_ACCOUNT_SEG5_CODE,'-').

有人可以建议代码有什么问题吗?什么是正确的有效代码?我正在从ColumnA(csv平面文件)中的数据获取到我的目标表(Oracle),但是这些其他列为空.

Can someone advice what is wrong with the code and what can be the correct working code? I am getting data from ColumnA (csv flat file) to my target table (Oracle) but these additional columns are coming as null.

我的表达式在这里

推荐答案

您需要结合使用SUBSTR和INSTR.您不应该使用硬编码值,因为它们可以根据您的数据而变化.可能就是您的映射中正在发生的事情.
v_col1= SUBSTR (data_col,1,instr(data_col,'-')-1 ) v_col2 = SUBSTR (data_col,instr(data_col,'-')+1, instr(data_col,'-',1,2)-1 ) v_col3 = SUBSTR (data_col,instr(data_col,'-',1,2)+1,instr(data_col,'-',1,3)-1 ) ...
另外,您也可以像下面一样使用REGEX.但是问题是,只有当数据具有3个-"(如'abc-defg-hij')时,此方法才有效.任何其他组合,拆分均无效.

You need to use combination of SUBSTR and INSTR. You shouldn't use hard code values since they can change depending on your data. Probably that's what is happening in your mapping.
v_col1= SUBSTR (data_col,1,instr(data_col,'-')-1 ) v_col2 = SUBSTR (data_col,instr(data_col,'-')+1, instr(data_col,'-',1,2)-1 ) v_col3 = SUBSTR (data_col,instr(data_col,'-',1,2)+1,instr(data_col,'-',1,3)-1 ) ...
Alternately you can use REGEX as well like below. But issue is this works ONLY when you have a data with 3 '-' like 'abc-defg-hij'. Any other combination, split wont work.

v_col1:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',1) v_col2:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',2) v_col3:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',3) ... 或者,您可以将文件数据加载到临时oracle表中,并使用regex_substr()进行拆分.

v_col1:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',1) v_col2:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',2) v_col3:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',3) ... Alternately you can load the file data into a temp oracle table and use regex_substr() to split them.

这篇关于在Informatica中将平面文件字符串拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆