ExecuteSQL处理器返回损坏的数据 [英] ExecuteSQL processor returns corrupted data
问题描述
我在NiFI中有一个流程,其中我使用ExecuteSQL处理器从hive
表中获取名为dt的子分区的整体合并.例如:我的表被sikid
和dt
分区.所以我在sikid=1, dt=1000
下,在sikid=2, dt=1000
下.
我所做的是select * from my_table where dt=1000
.
I have a flow in NiFI in which I use the ExecuteSQL processor to get a whole a merge of sub-partitions named dt from a hive
table. For example: My table is partitioned by sikid
and dt
. So I have under sikid=1, dt=1000
, and under sikid=2, dt=1000
.
What I did is select * from my_table where dt=1000
.
不幸的是,我从ExecuteSQL处理器得到的回报是损坏的数据,包括具有dt=NULL
的行,而原始表甚至没有一行具有dt = NULL的行.
Unfortunately, what I've got in return from the ExecuteSQL processor is corrupted data, including rows that have dt=NULL
while the original table does not have even one row with dt=NULL.
DBCPConnectionPool
配置为使用HiveJDBC4
jar.
后来我尝试根据CDH发行版使用兼容的jar,也没有对其进行修复.
The DBCPConnectionPool
is configured to use HiveJDBC4
jar.
Later I tried using the compatible jar according to the CDH release, didn't fix it either.
ExecuteSQL
处理器的配置如下:
Normalize Table/Column Names
:true
Use Avro Logical Types
:false
配置单元版本:1.1.0
Hive version: 1.1.0
CDH:5.7.1
CDH: 5.7.1
有什么想法吗?谢谢!
显然,我返回的数据包括多余的行...其中有数千行..这很奇怪.
Apparently my returned data includes extra rows... a few thousand of them.. which is quite weird.
推荐答案
最终,它是通过使用蜂巢属性hive.query.result.fileformat=SequenceFile
Eventually it was solved by using hive property hive.query.result.fileformat=SequenceFile
这篇关于ExecuteSQL处理器返回损坏的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!