将带有引号的表括起来的值导出到Hive中的本地CSV [英] Export table enclosing values with quotes to local csv in hive

查看:317
本文介绍了将带有引号的表括起来的值导出到Hive中的本地CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将表导出到配置单元中的本地csv文件.

I am trying to export a table to a local csv file in hive.

INSERT OVERWRITE LOCAL DIRECTORY '/home/sofia/temp.csv' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
ESCAPED BY '\\' 
LINES TERMINATED BY '\n'
select * from mytable;

问题在于某些值包含换行符"\ n",结果文件变得非常混乱.

The problem is that some of the values contain the newline "\n" character and the resulting file becomes really messy.

在Hive中导出时,是否有任何方法将引号括起来,以便csv文件可以包含特殊字符(尤其是换行符)?

Is there any way of enclosing the values in quotes when exporting in Hive, so that the csv file can contain special characters (and especially the newline)?

推荐答案

一种可能的解决方案是使用 Hive CSV SerDe (Serializer/Deserializer) .它提供了一种指定 custom delimiters, quote, and escape characters 的方法.

One possible solution could be to use Hive CSV SerDe (Serializer/Deserializer). It provides a way to specify custom delimiters, quote, and escape characters.

限制条件:

它不处理 embedded newlines

It does not handle embedded newlines

可用性:

CSV Serde 可在 Hive 0.14 中获得及更高版本.

The CSV Serde is available in Hive 0.14 and greater.

背景:

CSV SerDe 来自 https://github.com/ogrodnek/csv-serde ,并已添加到Hive 发行版中rel ="nofollow"> HIVE-7777.

用法:

SerDe 适用于大多数 CSV data ,但不能处理 embedded newlines .要使用 SerDe ,请指定完全限定的类名称 org.apache.hadoop.hive.serde2.OpenCSVSerde .

This SerDe works for most CSV data, but does not handle embedded newlines. To use the SerDe, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.

原始文档可从 https://github.com/ogrodnek/csv-serde 获得.

CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE;

默认分隔符,引号和转义符(如果未指定)

DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER  "
DEFAULT_SEPARATOR        ,

参考:蜂巢csv-serde

这篇关于将带有引号的表括起来的值导出到Hive中的本地CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆