如何处理列值中的定界符? [英] How to handle delimiter in column value?
问题描述
我试图将CSV文件数据加载到我的Hive表中,但是它在一列的值中具有delimiter(,),因此Hive将其作为定界符并将其加载到新列中.我尝试使用转义序列\,但是我也\(它无法正常工作,并且总是在,.
I am trying to load CSV file data into my Hive table,but but it has delimiter(,) , in one column's value, so Hive is taking it as a delimiter and loading it into a new column. I tried using escape sequence \ but in that I also \ (it its not working and always loading data in new column after , .
我的CSV文件.
id,name,desc,per1,roll,age
226,a1,"\"double bars","item1 and item2\"",0.0,10,25
227,a2,"\"doubles","item2 & item3 item4\"",0.1,20,35
228,a3,"\"double","item3 & item4 item5\"",0.2,30,45
229,a4,"\"double","item5 & item6 item7\"",0.3,40,55
我已经更新了我的桌子.
I have updated my table.:
create table testing(id int, name string, desc string, uqc double, roll int, age int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = '"',
"escapeChar" = "\\" ) STORED AS textfile;
但是,仍然在.之后的另一列中获取数据.
But still I'm getting data in a different column after ,.
我在路径命令中使用加载数据.
I'm using load data in path command.
推荐答案
这是基于RegexSerDe创建表的方法.
This is how to create table based on RegexSerDe.
每列在正则表达式中应具有对应的捕获组()
.您可以轻松调试regex,而无需使用 regex_replace
创建表:
Each column should have corresponding capturing group ()
in the regex. You can easily debug regex without creating the table using regex_replace
:
select regexp_replace('226,a1,"\"double bars","item1 and item2\"",0.0,10,25',
'^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*', --6 groups
'$1 $2 $3 $4 $5 $6'); --space delimited fields
结果:
226 a1 "double bars","item1 and item2" 0.0 10 25
如果看起来不错,请创建表:
If it seems good, create table:
create external table testing(id int,
name string,
desc string,
uqc double,
roll int,
age int
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex'='^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*')
location ....
TBLPROPERTIES("skip.header.line.count"="1")
;
阅读此文章以获取更多详细信息.
Read this article for more details.
这篇关于如何处理列值中的定界符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!