Hive:有没有办法获取表中存在的所有数字列的聚合? [英] Hive: Is there a way to get the aggregates of all the numeric columns existing in a table?

查看:27
本文介绍了Hive:有没有办法获取表中存在的所有数字列的聚合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含超过 50 列(数字和字符)的表,有没有办法在不指定每一列的情况下获取整体统计信息?

I have a table containing over 50 columns (both numeric and char), is there a way to get the overall statistics without specifying each column?

举个例子:

<代码>a b c d1 2 3 45 6 7 89 10 11 12

理想情况下,我会有类似的东西:

Ideally I would have something like:

column_name min avg max sum1 5 9 15b 2 6 10 18c 3 7 11 21d 4 8 12 24

尽管如此,一次获取一个聚合结果会更有帮助.

Nevertheless, getting one aggregate at a time it would be more more than helpful.

任何帮助/想法将不胜感激.

Any help/idea would be highly appreciated.

谢谢你,

推荐答案

您可以使用 AWK 解析 DESCRIBE TABLE 输出,并生成逗号分隔的 SUM(col) 字符串作为数字列的 sum_col 和所有其他列的 column_list.在此示例中,它使用 goup by 生成 select 语句.在 shell 中运行:

You can parse DESCRIBE TABLE output using AWK and generate comma separated string of SUM(col) as sum_col for numeric columns and column_list for all other columns. In this example it generates select statement with goup by. Run in shell:

TABLE_NAME=your_schema.your_table

NUMERIC_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -F " " 'f&&!NF{exit}{f=1}f{ if($2=="int"||$2=="double") printf c "sum("toupper($1)") as sum_"$1}{c=","}')

GROUP_BY_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -F " " 'f&&!NF{exit}{f=1}f{if($2!="int"&&$2!="double") printf c toupper($1)}{c=","}')

SELECT_STATEMENT="select $NUMERIC_COLUMNS $GROUP_BY_COLUMNS from $TABLE_NAME group by $GROUP_BY_COLUMNS"

我只检查 int 和 double 列.您添加更多类型.您也可以优化它并只执行一次 DESCRIBE,然后使用相同的 AWK 脚本解析结果.希望你明白了.

I'm checking only int and double columns. You add more types. Also you can optimize it and execute DESCRIBE only once, then parse result using same AWK scripts. Hope you got the idea.

这篇关于Hive:有没有办法获取表中存在的所有数字列的聚合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆