在Impala中使用Hive UDF会在Impala 1.2.4中产生错误的结果 [英] Using Hive UDF in Impala gives erroneous results in Impala 1.2.4

查看:476
本文介绍了在Impala中使用Hive UDF会在Impala 1.2.4中产生错误的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Java中有两个Hive UDF,它们在Hive中完美工作



这两个函数都是互补的。

 字符串myUDF(BigInt)
BigInt myUDFReverso(String)

myUDF(myInput)给出了一些输出,当$ myUDFReverso(myUDF( myInput))
应该返回 myInput



Hive,但是当我尝试在Impala(版本1.2.4)中使用
时,它会为 myUDF(BigInt)给出预期的
答案是正确的)
,但当传递给 myUDFReverso(String)时,答案不会给出
回原始答案)。



我注意到Impala 1.2.4
中的 length(myUDF(myInput))是错误的。它是每行+1。另外
在Hive和Impala(版本2.1)的情况下是正确的。因此,我假设有一些额外的(特殊的)字符被附加到$($) b $ b在Impala 1.2.4( Text 的最后
)输出 myUDF 从UDF函数返回的c $ c> datatype)。

我已经在Cpp中为Impala 1.2.4构建了一个类似的UDF,并且它可以正常工作。



所有这些问题都已在Impala 2.1中解决,但我无法将
升级到它。



那么我该如何解决这个错误?



参考: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v1/v1 -2-4 / Installing-and-Using-Impala / ciiu_udf.html

解决方案

这是 IMPALA-1134 ,它已在Impala 2.1中修复。问题是返回的值以错误的方式被复制,这样可能会在字符串的末尾返回一些额外的内存。以前我们使用 getBytes (),它表示只有数据达到 getLength()是有效的。我认为可以尝试在输出中编码实际长度,然后在逆转函数中进行编码,取实际长度并仅使用有效部分。但是,这似乎相当棘手。我强烈建议您找到一种方法升级到Impala的最新版本,因为1.4以后有很多错误修复。


I have two Hive UDFs in Java which work perfectly well in Hive.

Both functions are complimentary to each other.

String myUDF(BigInt)
BigInt myUDFReverso(String)

myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput

This works in Hive but when I try to use it in Impala (version 1.2.4) it gives expected answer for myUDF(BigInt) (the answer printed is correct) but the answer when passed to myUDFReverso(String) doesn't give back original answer).

I have noticed that length(myUDF("myInput")) in Impala 1.2.4 is wrong. It is +1 for every row. And again it is correct in case of Hive and also Impala (version 2.1)

So, I assume there is some extra(special) character being appended at the end of the output of myUDF in Impala 1.2.4 (Precisely at the end of the Text datatype returned from the UDF function).

I have built a similar UDF for Impala 1.2.4 in Cpp and it works correctly.

All these issues are resolved in Impala 2.1 but I cannot upgrade my cluster to it.

So how do I work around this bug?

Reference: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v1/v1-2-4/Installing-and-Using-Impala/ciiu_udf.html

解决方案

This is IMPALA-1134 which was fixed in Impala 2.1. The issue is that the returned value is copied in the wrong way such that some extra memory may be returned at the end of your string. Previously we were using getBytes() which says only the data up to getLength() is valid. I think it could be possible to try to encode the real length in the output and then in your reversal function, take the real length and only use the valid portion. However, this seems pretty tricky. I'd highly recommend finding a way to upgrade to the latest version of Impala as there are many bug fixes since 1.4.

这篇关于在Impala中使用Hive UDF会在Impala 1.2.4中产生错误的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆