将带有引号的表括起来的值导出到Hive中的本地CSV [英] Export table enclosing values with quotes to local csv in hive
问题描述
我正在尝试将表导出到配置单元中的本地csv文件.
I am trying to export a table to a local csv file in hive.
INSERT OVERWRITE LOCAL DIRECTORY '/home/sofia/temp.csv'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
select * from mytable;
问题在于某些值包含换行符"\ n",结果文件变得非常混乱.
The problem is that some of the values contain the newline "\n" character and the resulting file becomes really messy.
在Hive中导出时,是否有任何方法将引号括起来,以便csv文件可以包含特殊字符(尤其是换行符)?
Is there any way of enclosing the values in quotes when exporting in Hive, so that the csv file can contain special characters (and especially the newline)?
推荐答案
一种可能的解决方案是使用 Hive CSV SerDe (Serializer/Deserializer)
.它提供了一种指定 custom delimiters, quote, and escape characters
的方法.
One possible solution could be to use Hive CSV SerDe (Serializer/Deserializer)
. It provides a way to specify custom delimiters, quote, and escape characters
.
限制条件:
它不处理 embedded newlines
It does not handle embedded newlines
可用性:
CSV Serde
可在 Hive 0.14 中获得及更高版本.
The CSV Serde
is available in Hive 0.14 and greater.
背景:
CSV SerDe
来自 https://github.com/ogrodnek/csv-serde ,并已添加到Hive
发行版中rel ="nofollow"> HIVE-7777.
用法:
此 SerDe
适用于大多数 CSV data
,但不能处理 embedded newlines
.要使用 SerDe
,请指定完全限定的类名称 org.apache.hadoop.hive.serde2.OpenCSVSerde
.
This SerDe
works for most CSV data
, but does not handle embedded newlines
. To use the SerDe
, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde
.
原始文档可从 https://github.com/ogrodnek/csv-serde 获得.
CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE;
默认分隔符,引号和转义符(如果未指定)
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR ,
参考:蜂巢csv-serde
这篇关于将带有引号的表括起来的值导出到Hive中的本地CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!