如何将Pig中的许多地图的元组分割成不同的行 [英] How do I split in Pig a tuple of many maps into different rows
问题描述
([account_id#100,
timestamp# 1434,
id#900],
[account_id#100,
timestamp#1434,
id#901],
[ account_id#100,
timestamp#1434,
id#902])
你可以看到,我有一个元组中的三个地图对象。上述所有数据都在关系的$ 0的字段之内。所以上面的数据与单个bytearray列的关系。
数据加载如下:
data = load's3:// data / data'using com.twitter.elephantbird.pig.load.JsonLoader(' - nestedLoad');
DESCRIBE数据;
数据:{bytearray}
如何将此数据结构拆分成三行,以便输出如下?
data:{account_id:chararray,timestamp:chararray,id:int}
(100,1434,900)
(100,1434,901)
(100,1434,902)
在没有输入数据的情况下,很难猜出您的问题。如果这是一个中间结果,然后使用STORE写出来,并将输出文件作为我们可以输入的东西来尝试。我可以使用STRSPLIT来解决这个问题,但是不知道你是否意味着输入是单列和单行,或者是这三行具有相同列的输入。
在任一种情况下,使用FLATTEN运算符平坦化数据,稍后使用STRSPLIT应有帮助。如果我得到更多的信息和输入数据的问题,我可以给一个工作的例子。
数据 - > FLATTEN离开包 - > STRSPLIT over,in a FOREACH,GENERATE
I have a relation in Pig that looks like this:
([account_id#100,
timestamp#1434,
id#900],
[account_id#100,
timestamp#1434,
id#901],
[account_id#100,
timestamp#1434,
id#902])
As you can see, I have three map objects within a tuple. All of the data above is within the $0'th field in the relation. So the data above in a relation with a single bytearray column.
The data is loaded as follows:
data = load 's3://data/data' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DESCRIBE data;
data: {bytearray}
How do I split this data structure into three rows so that the output is as follows?
data: {account_id:chararray, timestamp:chararray, id:int}
(100, 1434,900)
(100, 1434,901)
(100, 1434,902)
It is very difficult to guess your problem without having a sample input data. If this is an intermediate result, then write it out using a STORE and put the output file as something that we can input to try out. I was able to solve this using STRSPLIT but am not sure if you meant that the input is a single column and a single row or are these three different rows with the same column.
In either case, Flattening out the data using the FLATTEN operator and using STRSPLIT later should help. If I get more information and input data for the problem, I can give a working example.
Data -> FLATTEN to get out of bag -> STRSPLIT over "," in a FOREACH,GENERATE
这篇关于如何将Pig中的许多地图的元组分割成不同的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!