将数据加载到 hive 时从字段中删除周围的引号 [英] remove surrounding quotes from fields while loading data into hive
问题描述
我想将一个包含输入数据的表加载到 hive 中.我有以下格式的数据.
I want to load a table with input data into hive. I have data in the following format.
"153662";"0002241447";"0"
"153662";"000647036X";"0"
"153662";"0020434901";"0"
"153662";"0020973403";"0"
"153662";"0028604202";"0"
"153662";"0030437512";"0"
我想将这些数据加载到一个有两个 varchar 列和一个 int 列的表中.但是周围的双引号让我很烦恼.我创建了下表.
I want to load this data into a table with two varchar columns and one int column.But the surrounding double quotes trouble me. I have created the following table.
CREATE EXTERNAL TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '
'
STORED AS TEXTFILE
但字段周围的引号也成为字段的一部分,如下所示.
but the quotes around the field also become part of field as shown below.
"276725" "034545104X" "0"
"276726" "0155061224" "5"
我想忽略它们.另外我希望第三个字段被读取为 INT.目前,当我在制作表格时提供第三个字段作为 INT 时,它变为 NULL.
I want to ignore them. Also I want the third field to be read as INT. Currently it becomes NULL when I provide third field as INT while making table.
推荐答案
你将不得不使用 Csv-Serde 用于此.
You will have to use Csv-Serde for this.
CREATE TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES
(
"separatorChar" = ";",
"quoteChar" = """
)
STORED AS TEXTFILE;
这篇关于将数据加载到 hive 时从字段中删除周围的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!