如何将HiveQL查询的结果输出到CSV? [英] How do I output the results of a HiveQL query to CSV?

查看:411
本文介绍了如何将HiveQL查询的结果输出到CSV?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们想将Hive查询的结果放到CSV文件中。我认为命令应该看起来像这样:

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

当我运行它,它说它completeld成功,但我永远找不到文件。我如何找到这个文件或者应该以不同的方式提取数据?

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

谢谢!

推荐答案

虽然可以使用 INSERT OVERWRITE 从Hive中获取数据,但它可能不是最适合您的情况的方法。首先让我解释一下 INSERT OVERWRITE 是什么,然后我将描述我用来从Hive表中获取tsv文件的方法。

Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.

根据手册,您的查询将数据存储在HDFS的目录中。格式不会是csv。

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.


写入文件系统的数据序列化为文本,列以^ A分隔,行以换行符分隔。如果任何列不是原始类型,那么这些列将序列化为JSON格式。

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

稍作修改code> LOCAL 关键字)将数据存储在本地目录中。

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

当我运行类似的查询时,这里是输出的样子。

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0
[lvermeer@hadoop temp]$ head 000000_0 
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE

就我个人而言,我通常直接通过Hive在命令行上运行我的查询来管理这种东西,像这样:

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

这给我一个可以使用的制表符分隔的文件。希望对你也有用。

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

基于这补丁-3682 ,我怀疑更好的解决方案是可用的,当使用Hive 0.11,但我无法自己测试。新语法应允许以下内容。

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
select books from table;

希望有帮助。

这篇关于如何将HiveQL查询的结果输出到CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆