在HIVE中加载外部表格时如何忽略方括号 [英] How do I ignore brackets when loading exteral table in HIVE

查看:423
本文介绍了在HIVE中加载外部表格时如何忽略方括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将猪脚本的摘录加载为HIVE中的外部表.猪将方括号()(元组?)之间的每一行括起来,如下所示:

I'm trying to load an extract of a pig script as an external table in HIVE. Pig enclosed each row between brackets () (tuples?) like this:

(1,2,3,a)
(2,4,5,b)
(4,2,6,c)

(1,2,3,a)
(2,4,5,b)
(4,2,6,c)

并且我找不到一种方法来告诉HIVE忽略那些括号,因为第一列实际上是整数,所以这会导致第一列为空值.

and I can't find a way to tell HIVE to ignore those brackets which results in null values for the first column as it is actually an integer.

对如何进行有任何想法吗?

Any thoughts on how to proceed?

我知道我可以在PIG中使用FLATTEN命令,但我也想学习如何直接从HIVE处理这些文件.

I know I can use a FLATTEN command in PIG but I would also like to learn how to deal with these files directly from HIVE.

推荐答案

就像Ben所说的那样,一步是不可能的..但是您可以通过在配置单元中再创建一个临时表来做到这一点.

As Ben said there is no way to do in one step.. but you can do it by creating one more temp table in hive.

不确定是否要再增加一张桌子使它变得更复杂..但是它对我有用.

Not sure if I am making it more complicated with one more table.. but it worked for me.

create external table A_TEMP (first string,second int,third int,fourth string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
LOCATION '/user/hdfs/Adata';

将数据放在"Adata"文件夹下

Place your data under 'Adata' folder

create external table A (first int,second int,third int,fourth string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
LOCATION '/user/hdfs/Afinaldata';

现在可以插入数据

  insert into table A
    select cast(substr(first, 2, length(first) - 2) as int),second,third,substr(fourth, 1,length(fourth) - 1 ) from A_TEMP;

我知道类型转换会影响性能..但是对于给定的方案,这是我能想到的最好的方法.

I know type casting will hit performance.. but for the given scenario this is the best I could come up with.

这篇关于在HIVE中加载外部表格时如何忽略方括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆