使用Hiveql进行循环 [英] Looping using Hiveql
问题描述
我试图合并2个数据集,比如说A和B.数据集A有一个变量Flag,它取2个值。而不是将两个数据合并在一起,我试图根据标志变量合并2个数据集。
合并代码如下:
create table new_data as
在a = bx
$中选择一个。*,由A中的
作为左连接B作为b
b
$ b
因为我通过CLI运行Hive代码,所以我通过以下命令调用它:
hive -f new_data.hql
我调用合并的代码的循环部分基于Flag变量的数据如下:
用于1 2中的标志;
do
hive -hivevar flag = $ flag -f new_data.hql
done
我把上面的代码放在另一个叫做.hql的文件中:
hive -f loop_data。 hql
但是它引发错误。
无法识别'for''flag''附近的输入'有人可以告诉我我在哪里'b错误。
谢谢!
- 您应该将循环逻辑添加到shell脚本中。
文件名:loop_data.sh
用于1 2中的标志;
do
hive -hivevar flag = $ flag -f new_data.hql
done
并执行如下脚本:
sh loop_data.sh
- 在new_data.hql脚本中,您正在创建表。既然你应该分出DDL& DML分为两个独立的脚本。像
DDL:create_new_data.hql
create table new_data as
选择
a。*,
by
from
A作为左连接
B为b
ax = bx
其中
1 = 0;
DML:insert_new_data.hql
insert into new_data
选择
a。*,
by
from
A作为左连接
B as b
ax = bx
其中
flag = $ {hiveconf:flag}
并更新您的shell脚本,如:
文件名:loop_new_data.sh
#创建表格
hive -f create_new_data.hql
#插入数据
作为标志在1 2;
do
hive -hiveconf flag = $ flag -f insert_new_data.hql
done
然后执行它:
sh loop_new_data.sh
让我知道您是否需要更多信息。
I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable.
The merging code is the following:
create table new_data as
select a.*,b.y
from A as a left join B as b
on a.x=b.x
Since I'm running Hive code through CLI, I'm calling this through the following command
hive -f new_data.hql
The looping part of the code I'm calling to merge data based on "Flag" variable is the following:
for flag in 1 2;
do
hive -hivevar flag=$flag -f new_data.hql
done
I put the above code in another ".hql" file asn calling it:
hive -f loop_data.hql
But it's throwing error.
cannot recognize input near 'for' 'flag' 'in'
Can anybody please tell me where I'm making mistake.
Thanks!
- You should be adding the loop logic to shell script.
File Name: loop_data.sh
for flag in 1 2;
do
hive -hivevar flag=$flag -f new_data.hql
done
And execute the script like:
sh loop_data.sh
- In your new_data.hql script, you are creating table. Since you should split out the DDL & DML in 2 separate scripts. Like
DDL: create_new_data.hql
create table new_data as
select
a.*,
b.y
from
A as a left join
B as b on
a.x = b.x
where
1 = 0;
DML: insert_new_data.hql
insert into new_data
select
a.*,
b.y
from
A as a left join
B as b on
a.x = b.x
where
flag = ${hiveconf:flag}
And update you shell script like:
File Name: loop_new_data.sh
# Create table
hive -f create_new_data.hql
# Insert data
for flag in 1 2;
do
hive -hiveconf flag=$flag -f insert_new_data.hql
done
And execute it like:
sh loop_new_data.sh
Let me know if you want more info.
这篇关于使用Hiveql进行循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!