Hive使用带引号的字段中的逗号加载CSV [英] Hive load CSV with commas in quoted fields

查看:754
本文介绍了Hive使用带引号的字段中的逗号加载CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  CREATE TABLE mytable 

num1 INT,
text1 STRING,
num2 INT,
text2 STRING

行格式DELIMITED FIELDS TERMINATED BY,;

LOAD DATA LOCAL INPATH'/data.csv'
OVERWRITE INTO TABLE mytable;



csv由逗号(,)分隔,看起来像这个:

  1,一些文本,里面有逗号,123,更多文本

code>

这会返回损坏的数据,因为第一个字符串中有'''。

是否有一种设置文本分隔符或使Hive忽略字符串中的','的方法?



我无法更改csv的分隔符,因为它从外部来源。

解决方案

问题在于 Hive 文本。您需要通过更改字段之间的分隔符来预处理数据(例如:使用Hadoop流式作业),或者也可以尝试使用自定义的 CSV SerDe ,它使用OpenCSV解析文件。


I am trying to load a CSV file into a Hive table like so:

CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

LOAD DATA LOCAL INPATH '/data.csv'
OVERWRITE INTO TABLE mytable;    


The csv is delimited by an comma (,) and looks like this:

1, "some text, with comma in it", 123, "more text"

This will return corrupt data since there is a ',' in the first string.
Is there a way to set an text delimiter or make Hive ignore the ',' in strings?

I can't change the delimiter of the csv since it gets pulled from an external source.

解决方案

The problem is that Hive doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job) or you can also give a try to use a custom CSV SerDe which uses OpenCSV to parse the files.

这篇关于Hive使用带引号的字段中的逗号加载CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆