蜂巢:有没有一种方法可以获取表中所有数字列的汇总? [英] Hive: Is there a way to get the aggregates of all the numeric columns existing in a table?

查看:61
本文介绍了蜂巢:有没有一种方法可以获取表中所有数字列的汇总?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含50多个列(数字和字符)的表,有没有一种方法可以在不指定每一列的情况下获得总体统计信息?

I have a table containing over 50 columns (both numeric and char), is there a way to get the overall statistics without specifying each column?

例如:

a b c d1 2 3 45 6 7 89 10 11 12

理想情况下,我会遇到以下情况:

Ideally I would have something like:

column_name最小平均最大和1 5 9 15b 2 6 10 18c 3 7 11 21d 4 8 12 24

尽管如此,一次获得一个汇总将不仅仅是有用的.

Nevertheless, getting one aggregate at a time it would be more more than helpful.

任何帮助/想法都将不胜感激.

Any help/idea would be highly appreciated.

谢谢你,

推荐答案

您可以使用AWK解析DESCRIBE TABLE输出,并生成逗号分隔的SUM(col)字符串,作为数字列的sum_col和所有其他列的column_list.在此示例中,它生成带有goup by的select语句.在shell中运行:

You can parse DESCRIBE TABLE output using AWK and generate comma separated string of SUM(col) as sum_col for numeric columns and column_list for all other columns. In this example it generates select statement with goup by. Run in shell:

TABLE_NAME=your_schema.your_table

NUMERIC_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -F " " 'f&&!NF{exit}{f=1}f{ if($2=="int"||$2=="double") printf c "sum("toupper($1)") as sum_"$1}{c=","}')

GROUP_BY_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -F " " 'f&&!NF{exit}{f=1}f{if($2!="int"&&$2!="double") printf c toupper($1)}{c=","}')

SELECT_STATEMENT="select $NUMERIC_COLUMNS $GROUP_BY_COLUMNS from $TABLE_NAME group by $GROUP_BY_COLUMNS"

我只检查int和double列.您添加更多类型.您也可以对其进行优化,仅执行一次DESCRIBE,然后使用相同的AWK脚本解析结果.希望你有主意.

I'm checking only int and double columns. You add more types. Also you can optimize it and execute DESCRIBE only once, then parse result using same AWK scripts. Hope you got the idea.

这篇关于蜂巢:有没有一种方法可以获取表中所有数字列的汇总?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆