如何在 Pig 中将许多地图的元组拆分为不同的行 [英] How do I split in Pig a tuple of many maps into different rows

查看:25
本文介绍了如何在 Pig 中将许多地图的元组拆分为不同的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Pig 中有一个如下所示的关系:

I have a relation in Pig that looks like this:

([account_id#100,
 timestamp#1434,
 id#900],

[account_id#100,
 timestamp#1434,
 id#901],

[account_id#100,
 timestamp#1434,
 id#902])

如您所见,我在一个元组中有三个地图对象.以上所有数据都在关系中的第 0 美元字段内.所以上面的数据与单个字节数组列有关系.

As you can see, I have three map objects within a tuple. All of the data above is within the $0'th field in the relation. So the data above in a relation with a single bytearray column.

数据加载如下:

data = load 's3://data/data' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');

DESCRIBE data;

data: {bytearray}

如何把这个数据结构拆分成三行,输出如下?

How do I split this data structure into three rows so that the output is as follows?

data: {account_id:chararray, timestamp:chararray, id:int}
(100, 1434,900)
(100, 1434,901)
(100, 1434,902)

推荐答案

如果没有样本输入数据,很难猜测您的问题.如果这是一个中间结果,则使用 STORE 将其写出并将输出文件作为我们可以输入尝试的内容.我能够使用 STRSPLIT 解决这个问题,但不确定您的意思是输入是单列和单行还是这三个不同的行具有相同的列.

It is very difficult to guess your problem without having a sample input data. If this is an intermediate result, then write it out using a STORE and put the output file as something that we can input to try out. I was able to solve this using STRSPLIT but am not sure if you meant that the input is a single column and a single row or are these three different rows with the same column.

在任何一种情况下,使用 FLATTEN 运算符展平数据并稍后使用 STRSPLIT 应该会有所帮助.如果我获得更多信息并输入问题的数据,我可以给出一个有效的例子.

In either case, Flattening out the data using the FLATTEN operator and using STRSPLIT later should help. If I get more information and input data for the problem, I can give a working example.

Data -> FLATTEN to get out of bag -> STRSPLIT over "," in a FOREACH,GENERATE

这篇关于如何在 Pig 中将许多地图的元组拆分为不同的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆