如何加载带有双引号的CSV数据并将其分隔到HIVE表中? [英] How to load CSV data with enclosed by double quotes and separated by tab into HIVE table?
问题描述
我试图从一个csv文件中加载数据,其中的值用双引号'''和tab分隔'\ t'。
但是当我尝试将它加载到配置单元中时,它不会抛出任何错误和数据加载没有任何错误,但我认为所有的数据都被加载到一个列中,它显示为NULL的大部分值。
以下是我的create table语句。
CREATE TABLE示例
(
组织STRING,
order BIGINT,
created_on TIMESTAMP,
ISSUE_DATE TIMESTAMP,
数量INT
)
行格式DELIMITED $ b $ TERMINATED BY ESCAPED BY ''
存储为TEXTFILE '\t'
b领域;
输入文件样本; -
组织订单创建于issue_date数量
GB1112232015/02/06 00:00:002015/05 / 15 00:00:005
英国11102015/05/06 00:00:002015/06/1 00:00:0051
和Load语句将数据推入配置单元表。
LOAD DATA INPATH'/user/example.csv'OVERWRITE INTO TABLE示例
可能是什么问题,以及如何忽略文件的标头。
,如果我从create语句中删除了ESCAPED BY'',它将在相应的列中加载它,但是所有的值都用双引号引起来
如何从值中删除双引号并忽略文件头?
您现在可以使用 OpenCSVSerde ,它允许您定义分隔符并轻松地绕过双引号: code> CREATE EXTERNAL TABLE示例(
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
行格式SERDE WITH SERDEPROPERTIES 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
(
的separatorChar= \t,
的quoteChar=\\ \\
)
LOCATION'/ your / folder / location /';
I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' . But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL. below is my create table statement.
CREATE TABLE example
(
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '"'
STORED AS TEXTFILE;
Input file sample;-
"Organization" "Order" "Created on" "issue_date" "qty"
"GB" "111223" "2015/02/06 00:00:00" "2015/05/15 00:00:00" "5"
"UK" "1110" "2015/05/06 00:00:00" "2015/06/1 00:00:00" "51"
and Load statement to push data into hive table.
LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example
What could be the issue and how can I ignore header of the file. and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes. How can I remove double quotes from values and ignore header of the file?
You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :
CREATE EXTERNAL TABLE example (
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "\""
)
LOCATION '/your/folder/location/';
这篇关于如何加载带有双引号的CSV数据并将其分隔到HIVE表中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!