计算猪查询中的百分比 [英] Calculating percentage in a pig query

查看:106
本文介绍了计算猪查询中的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • 我有一个包含两列的表(col1:string,col2:boolean)
  • 让我们说col1 ="aaa"
  • 对于col1 ="aaa",有许多True/False值 col2
  • 我要计算col1(aaa)的True值的百分比
  • I have a table with two columns (col1:string, col2:boolean)
  • Lets say col1 = "aaa"
  • For col1 = "aaa", there are many True/False values of col2
  • I want to calculate the percentage of True values for col1 (aaa)

输入:

aaa T
aaa F
aaa F
bbb T
bbb T
ccc F
ccc F

输出

COL1   TOTAL_ROWS_IN_INPUT_TABLE   PERCENTAGE_TRUE_IN_INPUT_TABLE
aaa     3                          33%
bbb     2                          100%
ccc     2                          0%

我该如何使用猪(拉丁文)来做到这一点?

How would I do this using PIG (LATIN)?

推荐答案

在Pig 0.10中,SUM(INPUT.col2)不起作用,无法转换为布尔值,因为它将INPUT.col2视为一袋布尔值,而bag是不是原始类型.另一件事是,如果将col2的输入数据指定为布尔值,则输入的转储没有col2的任何值,但是将其视为chararray可以正常工作.

In Pig 0.10 SUM(INPUT.col2) does not work and casting to boolean is not possible as it treats INPUT.col2 as a bag of boolean and bag is not a primitive type. Another thing is that if the input data for col2 is specified as boolean, than dump of the input does not have any values for the col2, but treating it as a chararray works just fine.

Pig非常适合此类任务,因为它具有使用嵌套在FOREACH中的运算符与单个组一起工作的手段.这是可行的解决方案:

Pig is well suited for this type of tasks as it has means to work with individual groups by using operators nested in a FOREACH. Here is the solution which works:

inpt = load '....' as (col1 : chararray, col2 : chararray);
grp = group inpt by col1; -- creates bags for each value in col1
result = foreach grp {
    total = COUNT(inpt);
    t = filter inpt by col2 == 'T'; --create a bag which contains only T values
    generate flatten(group) as col1, total as  TOTAL_ROWS_IN_INPUT_TABLE, 100*(double)COUNT(t)/(double)total as PERCENTAGE_TRUE_IN_INPUT_TABLE;
};

dump result;

输出:

(aaa,3,33.333333333333336)
(bbb,2,100.0)
(ccc,2,0.0)

这篇关于计算猪查询中的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆