在HIVE中加载外部表格时如何忽略方括号 [英] How do I ignore brackets when loading exteral table in HIVE
问题描述
我正在尝试将猪脚本的摘录加载为HIVE中的外部表.猪将方括号()(元组?)之间的每一行括起来,如下所示:
I'm trying to load an extract of a pig script as an external table in HIVE. Pig enclosed each row between brackets () (tuples?) like this:
(1,2,3,a)
(2,4,5,b)
(4,2,6,c)
(1,2,3,a)
(2,4,5,b)
(4,2,6,c)
并且我找不到一种方法来告诉HIVE忽略那些括号,因为第一列实际上是整数,所以这会导致第一列为空值.
and I can't find a way to tell HIVE to ignore those brackets which results in null values for the first column as it is actually an integer.
对如何进行有任何想法吗?
Any thoughts on how to proceed?
我知道我可以在PIG中使用FLATTEN命令,但我也想学习如何直接从HIVE处理这些文件.
I know I can use a FLATTEN command in PIG but I would also like to learn how to deal with these files directly from HIVE.
推荐答案
就像Ben所说的那样,一步是不可能的..但是您可以通过在配置单元中再创建一个临时表来做到这一点.
As Ben said there is no way to do in one step.. but you can do it by creating one more temp table in hive.
不确定是否要再增加一张桌子使它变得更复杂..但是它对我有用.
Not sure if I am making it more complicated with one more table.. but it worked for me.
create external table A_TEMP (first string,second int,third int,fourth string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/hdfs/Adata';
将数据放在"Adata"文件夹下
Place your data under 'Adata' folder
create external table A (first int,second int,third int,fourth string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/hdfs/Afinaldata';
现在可以插入数据
insert into table A
select cast(substr(first, 2, length(first) - 2) as int),second,third,substr(fourth, 1,length(fourth) - 1 ) from A_TEMP;
我知道类型转换会影响性能..但是对于给定的方案,这是我能想到的最好的方法.
I know type casting will hit performance.. but for the given scenario this is the best I could come up with.
这篇关于在HIVE中加载外部表格时如何忽略方括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!