使用Hiveql进行循环 [英] Looping using Hiveql

查看:288
本文介绍了使用Hiveql进行循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图合并2个数据集,比如说A和B.数据集A有一个变量Flag,它取2个值。而不是将两个数据合并在一起,我试图根据标志变量合并2个数据集。



合并代码如下:

  create table new_data as 
在a = bx

$中选择一个。*,由A中的
作为左连接B作为b
b
$ b

因为我通过CLI运行Hive代码,所以我通过以下命令调用它:

  hive -f new_data.hql 

我调用合并的代码的循环部分基于Flag变量的数据如下:

 用于1 2中的标志; 
do
hive -hivevar flag = $ flag -f new_data.hql
done

我把上面的代码放在另一个叫做.hql的文件中:

  hive -f loop_data。 hql 

但是它引发错误。



无法识别'for''flag''附近的输入'有人可以告诉我我在哪里'b错误。



谢谢!

解决方案


  1. 您应该将循环逻辑添加到shell脚本中。

文件名:loop_data.sh

 用于1 2中的标志; 
do
hive -hivevar flag = $ flag -f new_data.hql
done

并执行如下脚本:

  sh loop_data.sh 




  1. 在new_data.hql脚本中,您正在创建表。既然你应该分出DDL& DML分为两个独立的脚本。像

DDL:create_new_data.hql

  create table new_data as 
选择
a。*,
by
from
A作为左连接
B为b
ax = bx
其中
1 = 0;

DML:insert_new_data.hql

  insert into new_data 
选择
a。*,
by
from
A作为左连接
B as b
ax = bx
其中
flag = $ {hiveconf:flag}

并更新您的shell脚本,如:

文件名:loop_new_data.sh

 #创建表格
hive -f create_new_data.hql

#插入数据
作为标志在1 2;
do
hive -hiveconf flag = $ flag -f insert_new_data.hql
done

然后执行它:

  sh loop_new_data.sh 

让我知道您是否需要更多信息。


I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable.

The merging code is the following:

create table new_data as
select a.*,b.y
from A as a left join B as b
on a.x=b.x

Since I'm running Hive code through CLI, I'm calling this through the following command

hive -f new_data.hql

The looping part of the code I'm calling to merge data based on "Flag" variable is the following:

for flag in 1 2;
do
  hive -hivevar flag=$flag -f new_data.hql
done

I put the above code in another ".hql" file asn calling it:

hive -f loop_data.hql

But it's throwing error.

cannot recognize input near 'for' 'flag' 'in'

Can anybody please tell me where I'm making mistake.

Thanks!

解决方案

  1. You should be adding the loop logic to shell script.

File Name: loop_data.sh

for flag in 1 2;
do
  hive -hivevar flag=$flag -f new_data.hql
done

And execute the script like:

sh loop_data.sh

  1. In your new_data.hql script, you are creating table. Since you should split out the DDL & DML in 2 separate scripts. Like

DDL: create_new_data.hql

create table new_data as
select 
  a.*,
  b.y
from 
  A as a left join 
  B as b on 
  a.x = b.x
where 
  1 = 0;

DML: insert_new_data.hql

insert into new_data 
select 
  a.*,
  b.y
from 
  A as a left join 
  B as b on 
  a.x = b.x
where
  flag = ${hiveconf:flag}

And update you shell script like:

File Name: loop_new_data.sh

# Create table
hive -f create_new_data.hql

# Insert data
for flag in 1 2;
do
  hive -hiveconf flag=$flag -f insert_new_data.hql
done

And execute it like:

sh loop_new_data.sh

Let me know if you want more info.

这篇关于使用Hiveql进行循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆