Talend-合并两行分隔文件以获取一个寄存器 [英] Talend - Merge two rows of a Delimited file to get one single register

查看:221
本文介绍了Talend-合并两行分隔文件以获取一个寄存器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析定界文件文件,以获取要放入数据库表中的信息.

I'm parsing Delimited Files files to get information that I will put in a Database Table.

现在,我有一个文件,我必须在其中合并每两行,以获取有关一个寄存器(数据库表的一行)的信息-第1行包含数据库行中的某些字段,第2行包含一些字段其他字段要放在同一行中.

Now, I have a file where I have to merge each two rows, to get information about one single register (one row of the Database Table) - the line 1 has some fields from a database row, and line 2 has some other fields to put in the same row.

如何一次处理两行?

例如,假设我有一个包含6行的文件,则它对应于我的数据库表中的3个条目(具有9列).从奇数行"中,获取第1、3、4、5、8和9列.从偶数行"中,获取其余信息(第2、6和7列):

For example, assuming that I have a file with 6 rows, it corresponds to 3 entries in my Database Table, that has 9 columns. From "odd lines" I get the Columns 1, 3, 4, 5, 8 and 9. From "even rows", I get the remaining info (Columns 2, 6 and 7):

IN  | COLUMN1 | xxxxxxx  | COLUMN3 | COLUMN4 | COLUMN5 | xxxxxxx | xxxxxxx | COLUMN8

OUT | xxxxxxx | COLUMN2 | xxxxxxx  | xxxxxxx | xxxxxxx | COLUMN6 | COLUMN7 | xxxxxxx

IN  | COLUMN1 | xxxxxxx  | COLUMN3 | COLUMN4 | COLUMN5 | xxxxxxx | xxxxxxx | COLUMN8

OUT | xxxxxxx | COLUMN2 | xxxxxxx  | xxxxxxx | xxxxxxx | COLUMN6 | COLUMN7 | xxxxxxx

IN  | COLUMN1 | xxxxxxx  | COLUMN3 | COLUMN4 | COLUMN5 | xxxxxxx | xxxxxxx | COLUMN8

OUT | xxxxxxx | COLUMN2 | xxxxxxx  | xxxxxxx | xxxxxxx | COLUMN6 | COLUMN7 | xxxxxxx

推荐答案

您可以尝试将文件拆分为2种类型的行,然后使用tMap将它们连接起来.

You could try splitting the file up into your 2 types of rows and then using a tMap to join them.

为进一步说明,您将希望根据文件是IN还是OUT来分割文件,然后根据需要使用tMap联接列.

To clarify further you'll want to split the file depending on whether it's an IN or OUT and then use a tMap to join the columns as per your needs.

我对您的示例数据进行了一些修改,使其看起来像:

I've modified your example data a little to look a little like:

|=---+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------=|
|IN1 |ROW1COLUMN1|xxxxxxx    |ROW1COLUMN3|ROW1COLUMN4|ROW1COLUMN5|xxxxxxx    |xxxxxxx    |ROW1COLUMN8|
|OUT1|xxxxxxx    |ROW1COLUMN2|xxxxxxx    |xxxxxxx    |xxxxxxx    |ROW1COLUMN6|ROW1COLUMN7|xxxxxxx    |
|IN2 |ROW2COLUMN1|xxxxxxx    |ROW2COLUMN3|ROW2COLUMN4|ROW2COLUMN5|xxxxxxx    |xxxxxxx    |ROW2COLUMN8|
|OUT2|xxxxxxx    |ROW2COLUMN2|xxxxxxx    |xxxxxxx    |xxxxxxx    |ROW2COLUMN6|ROW2COLUMN7|xxxxxxx    |
|IN3 |ROW3COLUMN1|xxxxxxx    |ROW3COLUMN3|ROW3COLUMN4|ROW3COLUMN5|xxxxxxx    |xxxxxxx    |ROW3COLUMN8|
|OUT3|xxxxxxx    |ROW3COLUMN2|xxxxxxx    |xxxxxxx    |xxxxxxx    |ROW3COLUMN6|ROW3COLUMN7|xxxxxxx    |
'----+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------'

唯一真正的补充是,现在在第一列的IN或OUT旁边有一个关于如何连接它的键.

The only real addition is that there is now a key as to how it should be joined next to the IN or OUT of the first column.

首先,您需要使用如下设置的tMap将数据分为进出部分:

First you'll want to split the data up into your in and out parts with a tMap set up like:

这只是根据Id字段以"IN"或"Out"开头的方式沿两个路径之一发送数据.

This simply sends the data down one of two paths depending on whether the Id field begins with "IN" or "Out".

在此之后,您需要将其与另一个tMap重新组合,例如:

After this you'll want to recombine it with another tMap set up like:

此操作基于从Id文件提取的键进行连接,并在组合输出中使用适当的列.

This joins based on the extracted key from the Id file and uses the appropriate columns in the combined output.

不幸的是,您不能使用tMap拆分流,然后直接将其重新加入另一个tMap,所以最好的选择是将其输出到两个单独的位置(数据库表或临时CSV文件),然后再将其输出子作业完成,然后在这些单独的位置阅读并与第二个tMap重新组合.

Unfortunately you can't split a flow with a tMap and then rejoin it simply straight back into another tMap so the best bet is to output it to two separate places (either database tables or temporary CSV files), and then when that subjob is complete then to read in those separate places and recombine with the second tMap.

示例工作可能如下:

如果没有要连接的自然键,则可以通过获取第一个tMap的输出,然后添加带有

If you don't have a natural key to join on then you could generate one by taking the outputs of the first tMap and then adding a column with an expression of Numeric.sequence as the value for the column.

这篇关于Talend-合并两行分隔文件以获取一个寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆