如何将 HiveQL 查询的结果输出到 CSV? [英] How do I output the results of a HiveQL query to CSV?

查看:43
本文介绍了如何将 HiveQL 查询的结果输出到 CSV?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们想将 Hive 查询的结果放入 CSV 文件.我认为命令应该是这样的:

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

当我运行它时,它说它已成功完成,但我永远找不到该文件.我如何找到这个文件,或者我应该以不同的方式提取数据?

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

推荐答案

虽然可以使用 INSERT OVERWRITE 从 Hive 中获取数据,但对于您的特定情况,这可能不是最佳方法.首先让我解释一下 INSERT OVERWRITE 的作用,然后我将描述我用来从 Hive 表中获取 tsv 文件的方法.

Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.

根据手册,您的查询将将数据存储在 HDFS 的目录中.格式不会是 csv.

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

写入文件系统的数据被序列化为文本,列由 ^A 分隔,行由换行符分隔.如果任何列不是原始类型,那么这些列将被序列化为 JSON 格式.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

稍加修改(添加 LOCAL 关键字)会将数据存储在本地目录中.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

当我运行类似的查询时,输出如下所示.

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0
[lvermeer@hadoop temp]$ head 000000_0 
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE

就我个人而言,我通常在命令行上直接通过 Hive 运行我的查询,并将其通过管道传输到本地文件中,如下所示:

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

这给了我一个可以使用的制表符分隔文件.希望对你也有用.

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

基于 this patch-3682,我怀疑有更好的解决方案使用 Hive 0.11 时,但我无法自己测试.新语法应允许以下内容.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
select books from table;

希望有所帮助.

这篇关于如何将 HiveQL 查询的结果输出到 CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆