如何转置 PIG 中的列和行 [英] How Do I transpose columns and rows in PIG

查看:28
本文介绍了如何转置 PIG 中的列和行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定这是否可以使用内置的 PIG 脚本来完成,或者我需要编写一个 UDF.但我基本上有一个表格,我只是想在其中转置数据.

I'm not sure if this can be done with builtin PIG scripts or I'll need to code a UDF. But I have essentially a table where I simply want to transpose the data.

简单地说,给定:

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)
(11, 12, 13, 14, 15)
 ... 300 plus more tuples

我会得到:

(1,6,11,...) -> goes on for a few hundred more
(2,7,12,...)
(3,8,13,...)
(4,9,14,...)
(5,10,15,...)

关于如何实现这一目标的任何建议?

Any suggestions on how I could accomplish this?

推荐答案

这对 Pig 来说是不可能的,对它来说也没有多大意义.请记住,关系是一组元组,根据定义,不能保证一个包的元组按任何特定顺序排列.你可以从

This is not possible with Pig, nor does it make much sense for it to be. Remember that a relation is a bag of tuples, and by definition, a bag is not guaranteed to have its tuples in any specific order. You might start with

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)
(11, 12, 13, 14, 15)

但从猪的角度来看,这和

but from Pig's perspective there is no difference between this and

(11, 12, 13, 14, 15)
(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)

这意味着转置"定义不明确.这样看——如果你转置两次,你最终应该得到相同的数据结构,但是因为元组可以在此过程中重新排序,所以不能保证会发生这种情况.

which means that "transpose" is ill-defined. Look at it this way -- if you transpose twice, you should end up with the same data structure back, but because the tuples can be reordered along the way, this is not guaranteed to happen.

最后,如果你真的必须做矩阵运算,你最好使用一个同时尊重行和列顺序的工具.

In the end, if you really must do matrix operations, you would be better off using a tool that respects ordering in both rows and columns.

也就是说,你想完成什么?

That said, what are you trying to accomplish?

这篇关于如何转置 PIG 中的列和行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆