如何通过将表名作为文件输入来创建unix脚本来循环Hive SELECT查询? [英] How to create a unix script to loop a Hive SELECT query by taking table names as input from a file?
问题描述
我要做的事情很简单.我只需要统计多个Hive表中的记录即可.
It's pretty straightforward what I'm trying to do. I just need to count the records in multiple Hive tables.
我想创建一个非常简单的hql
脚本,该脚本以表名作为输入的file.txt并计算其中每个记录的总数:
I want to create a very simple hql
script that takes a file.txt with table names as input and count the total number of records in each of them:
SELECT COUNT(*) from <tablename>
输出应类似于:
table1 count1
table2 count2
table3 count3
我是Hive的新手,也不熟悉Unix脚本,因此我无法弄清楚如何创建脚本来执行此操作.
I'm new to Hive and not very well versed in Unix scripting, and I'm unable to figure out how to create a script to perform this.
有人可以帮助我吗?预先感谢.
Can someone please help me in doing this? Thanks in advance.
推荐答案
简单的shell脚本:
Simple working shell script:
db=mydb
for table in $(hive -S -e "use $db; show tables;")
do
#echo "$table"
hive -S -e "use $db; select '$table' as table_name, count(*) as cnt from $table;"
done
您可以改进此脚本并使用选择命令生成文件,甚至使用union all
进行单选,然后执行文件而不是为每个表调用Hive.
You can improve this script and generate file with select commands or even single select with union all
, then execute file instead of calling Hive for each table.
如果您想从文件中读取表名,请使用以下命令:
If you want to read table names from file, use this:
for table in filename
do
...
done
这篇关于如何通过将表名作为文件输入来创建unix脚本来循环Hive SELECT查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!