如何使用Shell脚本将HiveQL查询的结果输出到CSV? [英] How do I output the results of a HiveQL query to CSV using a shell script?

查看：820 发布时间：2020/9/19 1:43:32 database bash hadoop hive bigdata

本文介绍了如何使用Shell脚本将HiveQL查询的结果输出到CSV?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想运行多个Hive查询，最好是并行而不是顺序运行，并将每个查询的输出存储到一个csv文件中.例如，csv1中的query1输出，csv2中的query2输出，等等.我将在下班后运行这些查询，以期在下一个工作日分析输出.我对使用bash shell脚本感兴趣，因为这样我就可以设置一个cron任务在一天的特定时间运行它.

I would like to run multiple Hive queries, preferably in parallel rather than sequentially, and store the output of each query into a csv file. For example, query1 output in csv1, query2 output in csv2, etc. I would be running these queries after leaving work with the goal of having output to analyze during the next business day. I am interested in using a bash shell script because then I'd be able to set-up a cron task to run it at a specific time of day.

我知道如何将HiveQL查询的结果存储在CSV文件中，一次存储一个查询.我这样做的方式如下:

I know how to store the results of a HiveQL query in a CSV file, one query at a time. I do that with something like the following:

hive -e 
"SELECT * FROM db.table;" 
" | tr "\t" "," > example.csv;

上述问题是我必须监视进程何时完成并手动启动下一个查询.我也知道如何依次运行多个查询，如下所示:

The problem with the above is that I have to monitor when the process finishes and manually start the next query. I also know how to run multiple queries, in sequence, like so:

hive -f hivequeries.hql

有没有办法将这两种方法结合起来?有没有更聪明的方法可以实现我的目标?

Is there a way to combine these two methods? Is there a smarter way to achieve my goals?

首选代码答案，因为我不太了解bash以便从头开始编写.

Code answers are preferred since I do not know bash well enough to write it from scratch.

这个问题是另一个问题的变体:

This question is a variant of another question: How do I output the results of a HiveQL query to CSV?

推荐答案

您可以在Shell脚本中运行和监视并行作业:

You can run and monitor parallel jobs in a shell script:

#!/bin/bash

#Run parallel processes and wait for their completion

#Add loop here or add more calls
hive -e "SELECT * FROM db.table1;" | tr "\t" "," > example1.csv &
hive -e "SELECT * FROM db.table2;" | tr "\t" "," > example2.csv &
hive -e "SELECT * FROM db.table3;" | tr "\t" "," > example3.csv &

#Note the ampersand in above commands says to create parallel process
#You can wrap hive call in a function an do some logging in it, etc
#And call a function as parallel process in the same way
#Modify this script to fit your needs

#Now wait for all processes to complete

#Failed processes count
FAILED=0

for job in `jobs -p`
do
   echo "job=$job"
   wait $job || let "FAILED+=1"
done   

#Final status check
if [ "$FAILED" != "0" ]; then
    echo "Execution FAILED!  ($FAILED)"
    #Do something here, log or send messege, etc
    exit 1
fi

#Normal exit
#Do something else here
exit 0

还有其他方法(使用XARGS，GNU并行)在shell中运行并行进程，并在其中运行大量资源.另请阅读 https://www.slashroot.in/how-run- multiple-commands-parallel-linux 和 https ://thoughtsimproved.wordpress.com/2015/05/18/parellel-processing-in-bash/

There are other ways (using XARGS, GNU parallel) to run parallel processes in shell and a lot of resources on it. Read also https://www.slashroot.in/how-run-multiple-commands-parallel-linux and https://thoughtsimproved.wordpress.com/2015/05/18/parellel-processing-in-bash/

这篇关于如何使用Shell脚本将HiveQL查询的结果输出到CSV?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Shell脚本将HiveQL查询的结果输出到CSV? [英] How do I output the results of a HiveQL query to CSV using a shell script?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Shell脚本将HiveQL查询的结果输出到CSV? [英] How do I output the results of a HiveQL query to CSV using a shell script?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭