将数据加载到 hive 时从字段中删除周围的引号 [英] remove surrounding quotes from fields while loading data into hive

查看:113
本文介绍了将数据加载到 hive 时从字段中删除周围的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个包含输入数据的表加载到 hive 中.我有以下格式的数据.

I want to load a table with input data into hive. I have data in the following format.

"153662";"0002241447";"0"
"153662";"000647036X";"0"
"153662";"0020434901";"0"
"153662";"0020973403";"0"
"153662";"0028604202";"0"
"153662";"0030437512";"0"

我想将这些数据加载到一个有两个 varchar 列和一个 int 列的表中.但是周围的双引号让我很烦恼.我创建了下表.

I want to load this data into a table with two varchar columns and one int column.But the surrounding double quotes trouble me. I have created the following table.

CREATE EXTERNAL TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '
'

STORED AS TEXTFILE

但字段周围的引号也成为字段的一部分,如下所示.

but the quotes around the field also become part of field as shown below.

"276725"    "034545104X"    "0"
"276726"    "0155061224"    "5"

我想忽略它们.另外我希望第三个字段被读取为 INT.目前,当我在制作表格时提供第三个字段作为 INT 时,它变为 NULL.

I want to ignore them. Also I want the third field to be read as INT. Currently it becomes NULL when I provide third field as INT while making table.

推荐答案

你将不得不使用 Csv-Serde 用于此.

You will have to use Csv-Serde for this.

CREATE TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES 
(
    "separatorChar" = ";",
    "quoteChar"     = """
)  
STORED AS TEXTFILE;

这篇关于将数据加载到 hive 时从字段中删除周围的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆