Hive 在引用的字段中加载带有逗号的 CSV [英] Hive load CSV with commas in quoted fields
问题描述
我正在尝试将 CSV 文件加载到 Hive 表中,如下所示:
I am trying to load a CSV file into a Hive table like so:
CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
LOAD DATA LOCAL INPATH '/data.csv'
OVERWRITE INTO TABLE mytable;
csv 由逗号 (,) 分隔,如下所示:
The csv is delimited by an comma (,) and looks like this:
1, "some text, with comma in it", 123, "more text"
这将返回损坏的数据,因为第一个字符串中有一个,".
有没有办法设置文本分隔符或让 Hive 忽略字符串中的,"?
This will return corrupt data since there is a ',' in the first string.
Is there a way to set an text delimiter or make Hive ignore the ',' in strings?
我无法更改 csv 的分隔符,因为它是从外部来源提取的.
I can't change the delimiter of the csv since it gets pulled from an external source.
推荐答案
问题在于 Hive
不处理引用的文本.您需要通过更改字段之间的分隔符来预处理数据(例如:使用 Hadoop 流作业),或者您也可以尝试使用自定义 CSV SerDe 使用 OpenCSV 解析文件.
The problem is that Hive
doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job) or you can also give a try to use a custom CSV SerDe which uses OpenCSV to parse the files.
这篇关于Hive 在引用的字段中加载带有逗号的 CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!