计算猪查询中的百分比 [英] Calculating percentage in a pig query
问题描述
- 我有一个包含两列的表格(col1:string, col2:boolean)
- 假设 col1 = "aaa"
- 对于 col1 = "aaa",有很多 True/False 值col2
- 我想计算 col1 (aaa) 的 True 值的百分比
输入:
aaa T
aaa F
aaa F
bbb T
bbb T
ccc F
ccc F
输出
COL1 TOTAL_ROWS_IN_INPUT_TABLE PERCENTAGE_TRUE_IN_INPUT_TABLE
aaa 3 33%
bbb 2 100%
ccc 2 0%
我将如何使用 PIG(拉丁语)来做到这一点?
How would I do this using PIG (LATIN)?
推荐答案
In Pig 0.10 SUM(INPUT.col2) 不起作用并且无法强制转换为布尔值,因为它将 INPUT.col2 视为布尔值包,而 bag 是不是原始类型.另一件事是,如果 col2 的输入数据被指定为布尔值,那么输入的转储没有 col2 的任何值,但将其视为字符数组就可以了.
In Pig 0.10 SUM(INPUT.col2) does not work and casting to boolean is not possible as it treats INPUT.col2 as a bag of boolean and bag is not a primitive type. Another thing is that if the input data for col2 is specified as boolean, than dump of the input does not have any values for the col2, but treating it as a chararray works just fine.
Pig 非常适合此类任务,因为它可以通过使用嵌套在 FOREACH 中的运算符来处理各个组.这是有效的解决方案:
Pig is well suited for this type of tasks as it has means to work with individual groups by using operators nested in a FOREACH. Here is the solution which works:
inpt = load '....' as (col1 : chararray, col2 : chararray);
grp = group inpt by col1; -- creates bags for each value in col1
result = foreach grp {
total = COUNT(inpt);
t = filter inpt by col2 == 'T'; --create a bag which contains only T values
generate flatten(group) as col1, total as TOTAL_ROWS_IN_INPUT_TABLE, 100*(double)COUNT(t)/(double)total as PERCENTAGE_TRUE_IN_INPUT_TABLE;
};
dump result;
输出:
(aaa,3,33.333333333333336)
(bbb,2,100.0)
(ccc,2,0.0)
这篇关于计算猪查询中的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!