在猪中访问像数组这样的元素 [英] accessing an element like array in pig

查看:28
本文介绍了在猪中访问像数组这样的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下形式的数据:id,val1,val2

I have data in the form: id,val1,val2

示例

1,0.2,0.1
1,0.1,0.7
1,0.2,0.3
2,0.7,0.9
2,0.2,0.3
2,0.4,0.5

所以首先我想按 val1 按降序对每个 id 进行排序..所以像

So first I want to sort each id by val1 in decreasing order..so somethng like

1,0.2,0.1
1,0.2,0.3
1,0.1,0.7
2,0.7,0.9
2,0.4,0.5
2,0.2,0.3

然后为每个id选择第二个元素id,val2组合例如:

And then select the second element id,val2 combination for each id So for example:

  1,0.3
  2,0.5

我该如何处理?

谢谢

推荐答案

Pig 是一种脚本语言,不像 SQL 那样是关系型语言,它非常适合与具有嵌套在 FOREACH 中的运算符的组一起工作.解决方法如下:

Pig is a scripting language and not relational one like SQL, it is well suited to work with groups with operators nested inside a FOREACH. Here is the solutions:

A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:float, v2:float);
B = GROUP A BY id; -- isolate all rows for the same id
C = FOREACH B { -- here comes the scripting bit
    elems = ORDER A BY v1 DESC; -- sort rows belonging to the id
    two = LIMIT elems 2; -- select top 2
    two_invers = ORDER two BY v1 ASC; -- sort in opposite order to bubble second value to the top
    second = LIMIT two_invers 1;
    GENERATE FLATTEN(group) as id, FLATTEN(second.v2);
};
DUMP C;

在您的示例中,id 1 有两行,v1 == 0.2 但v2 不同,因此 id 1 的第二个值可以是 0.1 或 0.3

In your example id 1 has two rows with v1 == 0.2 but different v2, thus the second value for the id 1 can be 0.1 or 0.3

这篇关于在猪中访问像数组这样的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆