Hive外部表的自动列表 [英] Automated List of Hive External tables

查看:108
本文介绍了Hive外部表的自动列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须创建一个自动化的过程来列出Hive中的所有外部表,并对这些表进行记录计数.

I have to create an automated process to list all external tables in Hive and do a record count on those tables.

我应该将其作为日常工作.我通过对所有外部表名进行硬编码来尝试此操作,但是由于表每个月都会不断变化,因此不被接受.

I should do this as a daily job. I tried this by hard coding all the external table names, but this is not accepted as the tables keep on changing once in a month.

我经历过诸如[show tables]之类的不同方法,并在元存储数据库中执行查询.但是这些对自动完成过程没有帮助.

I have gone through different approaches like [show tables] and executing query in metastore DB. But these will not help me in automating the process.

在Hive中是否有更好的方法来实现这一点.

Is there a better approach to implement this in Hive.

推荐答案

使用shell这样的事情.

Something like this, using shell.

#Create external table list for a schema
SCHEMA=your_schema_name 

#define filenames   
alltableslist=tables_$SCHEMA
exttablelist=ext_tables_$SCHEMA

#Get all tables
 hive -S -e " set hive.cli.print.header=false; use $SCHEMA; show tables;" 1> $alltableslist


#For each table check its type:
for table in $(cat $alltableslist)
 do 

 echo Processing table $table ...

     #Describe table
     describe=$(hive client -S -e "use $SCHEMA; DESCRIBE FORMATTED $table")

     #Get type
     table_type=$(echo "${describe}" | egrep -o 'Table Type:[^,]+' | cut -f2)

     #Check table type, get count and write table name with count
      if [ $table_type == EXTERNAL_TABLE ]; then 
         #get count
          cnt=$(hive client -S -e "select count(*) from $SCHEMA.table ")
         #save result
          echo "$table $cnt" > $exttablelist 
      fi

done; #tables loop

只需将开头的your_schema_name替换为您的架构名称.在此示例中,带有计数的外部表将保存在文件ext_tables_[your_schema_name]

Just replace your_schema_name at the beginning with your schema name. External tables with counts in this example will be saved in the file ext_tables_[your_schema_name]

可以并行处理计数,甚至可以在单个SQL语句中处理计数,并且可以改进许多其他方面,但是希望您能理解.

Counts can be processed in parallel and even in single SQL statement and many other things can be improved, but hope you have caught the idea.

这篇关于Hive外部表的自动列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆