在Apache Pig中,如何将列序列化为行? [英] In Apache Pig how can I serialise columns into rows?

查看:118
本文介绍了在Apache Pig中,如何将列序列化为行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Apache Pig 中,我想将变量中保存的列序列化为行.更具体地说:

In Apache Pig I want to serialise columns held in a variable into rows. More specifically:

加载到变量中的数据(通过DUMP)看起来像

The data, loaded into the variable, look (via DUMP) like

(val1a, val2a,.... )
(val1b, val2b,val3b,.... )
(val1c, val2c,.... )
.
.
.

我想将其转换为

(val1a)
(val2a)
.
.
.
(val1b)
(val2b)
(val3b)
.
.
.
(val1c)
(val2c)
.
.
.

因此,必须将每一列序列化"为行,然后再添加这些行.请注意:我不一定知道每一行中有多少列.

So, each column has to be "serialised" into rows and then these rows are added subsequently. Please note: I do not necessarily know how many columns are in each row.

如何在猪拉丁语中执行此操作?例如在Python中会很容易,但是我不知道如何在Pig中做到这一点.我尝试了不同的foreach ... generate构造,但无法使其正常工作.

How can I do this in Pig Latin? It would be easy in, e.g., Python, but I don't know how to do it in Pig. I tried different foreach ... generate constructs, but could not make it work.

推荐答案

一种展开元组并创建多个元组的方法,每个元组包含一个字段:

One way to unfold tuples and create multiple tuples, each containing one field:

$ cat data.txt
val1a,val2a,val3a,val4a,val5a,val6a,val7a
val1b,val2b,val3b
val1c,val2c

A = load 'data.txt' using PigStorage(',');
B = foreach A generate FLATTEN(TOBAG(*));
dump B;

(val1a)
(val2a)
(val3a)
(val4a)
(val5a)
(val6a)
(val7a)
(val1b)
(val2b)
(val3b)
(val1c)
(val2c)

注意: 您可能还会检查这些类似的帖子:
在Pig中将一个元组拆分为多个元组
使用Apache Pig的数据透视表

Note: You might also check these similar posts:
Splitting a tuple into multiple tuples in Pig
Pivot table with Apache Pig

这篇关于在Apache Pig中,如何将列序列化为行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆