如何将Pig中的许多地图的元组分割成不同的行 [英] How do I split in Pig a tuple of many maps into different rows

查看:330
本文介绍了如何将Pig中的许多地图的元组分割成不同的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 ([account_id#100,
timestamp# 1434,
id#900],

[account_id#100,
timestamp#1434,
id#901],

[ account_id#100,
timestamp#1434,
id#902])

你可以看到,我有一个元组中的三个地图对象。上述所有数据都在关系的$ 0的字段之内。所以上面的数据与单个bytearray列的关系。



数据加载如下:

  data = load's3:// data / data'using com.twitter.elephantbird.pig.load.JsonLoader(' -  nestedLoad'); 

DESCRIBE数据;

数据:{bytearray}

如何将此数据结构拆分成三行,以便输出如下?

  data:{account_id:chararray,timestamp:chararray,id:int} 
(100,1434,900)
(100,1434,901)
(100,1434,902)


解决方案

在没有输入数据的情况下,很难猜出您的问题。如果这是一个中间结果,然后使用STORE写出来,并将输出文件作为我们可以输入的东西来尝试。我可以使用STRSPLIT来解决这个问题,但是不知道你是否意味着输入是单列和单行,或者是这三行具有相同列的输入。



在任一种情况下,使用FLATTEN运算符平坦化数据,稍后使用STRSPLIT应有帮助。如果我得到更多的信息和输入数据的问题,我可以给一个工作的例子。

 数据 - > FLATTEN离开包 - > STRSPLIT over,in a FOREACH,GENERATE 


I have a relation in Pig that looks like this:

([account_id#100,
 timestamp#1434,
 id#900],

[account_id#100,
 timestamp#1434,
 id#901],

[account_id#100,
 timestamp#1434,
 id#902])

As you can see, I have three map objects within a tuple. All of the data above is within the $0'th field in the relation. So the data above in a relation with a single bytearray column.

The data is loaded as follows:

data = load 's3://data/data' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');

DESCRIBE data;

data: {bytearray}

How do I split this data structure into three rows so that the output is as follows?

data: {account_id:chararray, timestamp:chararray, id:int}
(100, 1434,900)
(100, 1434,901)
(100, 1434,902)

解决方案

It is very difficult to guess your problem without having a sample input data. If this is an intermediate result, then write it out using a STORE and put the output file as something that we can input to try out. I was able to solve this using STRSPLIT but am not sure if you meant that the input is a single column and a single row or are these three different rows with the same column.

In either case, Flattening out the data using the FLATTEN operator and using STRSPLIT later should help. If I get more information and input data for the problem, I can give a working example.

Data -> FLATTEN to get out of bag -> STRSPLIT over "," in a FOREACH,GENERATE

这篇关于如何将Pig中的许多地图的元组分割成不同的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆