在 Talend 中透视数据 [英] Pivot data in Talend
问题描述
我有一些数据需要在 Talend 中进行透视.这是一个示例:
I have some data which I need to pivot in Talend. This is a sample:
brandname,metric,value
A,xyz,2
B,xyz,2
A,abc,3
C,def,1
C,ghi,6
A,ghi,1
现在我需要像这样在指标列上旋转这些数据:
Now I need this data to be pivoted on the metric column like this:
brandname,abc,def,ghi,xyz
A,3,null,1,2
B,null,null,null,2
C,null,1,6,null
目前我正在使用 tPivotToColumnsDelimited 将数据透视到一个文件并从该文件读回.然而,必须将数据存储在外部文件上并回读是混乱和不必要的开销.
Currently I am using tPivotToColumnsDelimited to pivot the data to a file and reading back from that file. However having to store data on an external file and reading back is messy and unnecessary overhead.
有没有办法在不写入外部文件的情况下使用 Talend 执行此操作?我尝试使用 tDenormalize 但据我所知,它会将行返回为 1 列,这不是我需要的.我还在 TalendExchange 中寻找了一些 3rd 方组件,但找不到任何有用的东西.
Is there a way to do this with Talend without writing to an external file? I tried to use tDenormalize but as far as I understand, it will return the rows as 1 column which is not what I need. I also looked for some 3rd party component in TalendExchange but couldn't find anything useful.
感谢您的帮助.
推荐答案
假设您的指标是固定的,您可以使用它们的名称作为输出的列.执行pivot的解决方案有两个部分:首先,一个tMap
,它将每个输入行in
的值转置到输出行中的相应列中out
和第二个 tAggregate
,它根据品牌名称对地图的输出行进行分组.
Assuming that your metrics are fixed, you can use their names as columns of the output. The solution to do the pivot has two parts: first, a tMap
that transposes the value of each input-row in
into the corresponding column in the output-row out
and second, a tAggregate
that groups the map's output-rows according to the brandname.
对于 tMap,您必须像这样有条件地填充列,例如名为abc"的输出列:out.abc = "abc".equals(in.metric)?in.value:null
For the tMap you'd have to fill the columns conditionally like this, example for output colum named "abc":
out.abc = "abc".equals(in.metric)?in.value:null
在 tAggregate
中,您必须按 out.brandname
分组并将每列聚合为总和,忽略空值.
In the tAggregate
you'd have to group by out.brandname
and aggregate each column as sum ignoring nulls.
这篇关于在 Talend 中透视数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!