来自CSV的Hive表。行中的引号结尾 [英] Hive table from CSV. The line termination in quotes

查看:428
本文介绍了来自CSV的Hive表。行中的引号结尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试从保存到HDFS的CSV文件创建表格。问题在于,csv在报价中包含换行符。以CSV格式记录的示例:

I try to create table from CSV file which is save into HDFS. The problem is that the csv consist line break inside of quote. Example of record in CSV:

ID,PR_ID,SUMMARY
2063,1184,"This is problem field because consists line break

This is not new record but it is part of text of third column
"

我创建了hive表:

CREATE TEMPORARY EXTERNAL TABLE  hive_database.hive_table
(   
    ID STRING,
    PR_ID STRING,
    SUMMARY STRING 
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties (
    "separatorChar" = ",",
    "quoteChar"     = "\"",
    "escapeChar"  = "\""
)     
stored as textfile
LOCATION '/path/to/hdfs/dir/csv'
tblproperties('skip.header.line.count'='1');

然后我尝试对行进行计数(正确的结果应该为1)

Then I try to count the rows (The correct result should by 1)

Select count(*) from hive_database.hive_table;

但结果是4不正确。你有什么想法如何解决它?感谢所有。

But the result is 4 what is incorrect. Do you have any idea how to solve it? Thanks all.

推荐答案

我找到了解决方案。你可以定义你自己的InputFormatter。然后,HQL表的DDL将如下所示(首先,您需要添加您的定制jar文件):

I found the solution. You can define your own InputFormatter. Then the DDL for HQL table will looks like this (At first you need to add your custom jar file):

ADD JAR /path/to/your/jar/CSVCustomInputFormat.jar;
DROP TABLE hive_database.hive_table;
CREATE EXTERNAL TABLE  hive_database.hive_table
(   
    ID STRING,
    PR_ID STRING,
    SUMMARY STRING 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"",
   "escapeChar"    = "\\"
) 
STORED AS 
INPUTFORMAT 'com.hql.custom.formatter.CSVCustomInputFormatt' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
LOCATION '/path/to/hdfs/dir/csv'
tblproperties('skip.header.line.count'='1');

然后如何创建自定义输入格式化程序,例如: https:/ /analyticsanvil.wordpress.com/2016/03/06/creating-a-custom-hive-input-format-and-record-reader-to -read-fixed-format-flat-files /

Then how to create the custom input formatter you can see for example here: https://analyticsanvil.wordpress.com/2016/03/06/creating-a-custom-hive-input-format-and-record-reader-to-read-fixed-format-flat-files/

这篇关于来自CSV的Hive表。行中的引号结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆