如何在Apache NiFi中将值从executeSQL动态传递到SelectHiveQL [英] how to pass values dynamically in Apache NiFi from executeSQL to SelectHiveQL

查看:701
本文介绍了如何在Apache NiFi中将值从executeSQL动态传递到SelectHiveQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表,一个在mysql test.employee中,另一个在hive default.dept中 我想将test.employee表的empid作为参数传递给hive表中的查询并将数据存储到HDFS中

I have two tables one in mysql test.employee and other in hive default.dept I want to pass empid of test.employee table as a parameter to query in hive table and store data into HDFS

ExecuteSQL->从test.employee中选择empid(提供10条记录)

ExecuteSQL -> select empid from test.employee (gives 10 records)

SelectHiveQL->选择*从default.dept,其中empid = $ {empid}(应检索10条记录)

SelectHiveQL -> SELECT * FROM default.dept where empid = ${empid} (should retrieve 10 records)

此处的图像描述

推荐答案

您可以执行以下操作:

  1. ExecuteSQL-检索员工记录
  2. ConvertAvroToJson-用于以后处理Empid
  3. SplitJson-每行拆分为一个流文件
  4. EvaluateJsonPath-将empid的值获取到流文件属性中
  5. ReplaceText-将内容设置为HiveQL语句(如上所述,使用表达式语言)
  6. SelectHiveQL-获取部门记录

请注意,这将为每个Empid值执行一个Hive SELECT,因此,每次执行SelectHiveQL都会产生一条记录.我不确定(例如IN子句的HiveQL语义)如何获取单个HiveQL语句,因为它是常量表"和Hive表之间的一种连接,更不用说NiFi处理了复杂,因为您不需要SplitJson,并且可能必须一次处理所有记录(例如,使用ExecuteScript)

Note that this executes a Hive SELECT for each of the empid values, so each execution of SelectHiveQL will produce a single record. I'm not sure (given HiveQL semantics for the IN clause, for example) how to get a single HiveQL statement since it's kind of a join between a "table of constants" and the Hive table, not to mention the NiFi processing is more complex as you won't want the SplitJson and would likely have to process all the records at once (with ExecuteScript, e.g.)

这篇关于如何在Apache NiFi中将值从executeSQL动态传递到SelectHiveQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆