如何处理列值中的定界符? [英] How to handle delimiter in column value?

查看：67 发布时间：2021/4/24 21:22:42 hadoop hive hiveql create-table

本文介绍了如何处理列值中的定界符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将CSV文件数据加载到我的Hive表中，但是它在一列的值中具有delimiter(，)，因此Hive将其作为定界符并将其加载到新列中.我尝试使用转义序列\，但是我也\(它无法正常工作，并且总是在，.

I am trying to load CSV file data into my Hive table,but but it has delimiter(,) , in one column's value, so Hive is taking it as a delimiter and loading it into a new column. I tried using escape sequence \ but in that I also \ (it its not working and always loading data in new column after , .

我的CSV文件.

        id,name,desc,per1,roll,age
        226,a1,"\"double bars","item1 and item2\"",0.0,10,25
        227,a2,"\"doubles","item2 & item3 item4\"",0.1,20,35
        228,a3,"\"double","item3 & item4 item5\"",0.2,30,45
        229,a4,"\"double","item5 & item6 item7\"",0.3,40,55

我已经更新了我的桌子.

I have updated my table.:

    create table testing(id int, name string, desc string, uqc double, roll int, age int) 
    ROW   FORMAT SERDE 
    'org.apache.hadoop.hive.serde2.OpenCSVSerde'
     WITH SERDEPROPERTIES (
    "separatorChar" = ",",
    "quoteChar" = '"',
    "escapeChar" = "\\" ) STORED AS textfile;

但是，仍然在.之后的另一列中获取数据.

But still I'm getting data in a different column after ,.

我在路径命令中使用加载数据.

I'm using load data in path command.

推荐答案

这是基于RegexSerDe创建表的方法.

This is how to create table based on RegexSerDe.

每列在正则表达式中应具有对应的捕获组().您可以轻松调试regex，而无需使用 regex_replace 创建表:

Each column should have corresponding capturing group () in the regex. You can easily debug regex without creating the table using regex_replace:

select regexp_replace('226,a1,"\"double bars","item1 and item2\"",0.0,10,25',
                      '^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*', --6 groups
                     '$1 $2 $3 $4 $5 $6'); --space delimited fields

结果:

226 a1 "double bars","item1 and item2" 0.0 10 25

如果看起来不错，请创建表:

If it seems good, create table:

 create external table testing(id int, 
                      name string, 
                      desc string, 
                      uqc double, 
                      roll int, 
                      age int
                     ) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex'='^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*')
location ....
TBLPROPERTIES("skip.header.line.count"="1")
;

阅读此文章以获取更多详细信息.

Read this article for more details.

这篇关于如何处理列值中的定界符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何处理列值中的定界符? [英] How to handle delimiter in column value?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何处理列值中的定界符? [英] How to handle delimiter in column value?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭