Hive哈希函数导致0,null和1,为什么? [英] Hive hash function resulting in 0,null and 1, why?

查看:413
本文介绍了Hive哈希函数导致0,null和1,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



类似于
选择哈希(日期,令牌1,令牌2 ,参数[a],参数[b],参数[c]);我在150M行上运行它。对于60%的行,它正确地散列它。对于其余的行,它给出0. null或1作为散列。我看着导致坏散列的行,我没有看到行有任何问题。什么可能导致它?

解决方案

只有当所有提供的参数为空或空时,哈希函数才返回0。

如果您熟悉Java,那么您可以检查散列函数



散列函数在内部使用 ObjectInspectorUtils.hashCode 来获取提供字段的hashCode,使用下面的java代码片段手动测试这个问题:

  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils; 
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;
public class TestHash
{
public static void main(String [] args)
{
System.out.println(ObjectInspectorUtils.hashCode(null,PrimitiveObjectInspectorFactory.javaStringObjectInspector ));
System.out.println(ObjectInspectorUtils.hashCode(new Text(),PrimitiveObjectInspectorFactory.javaStringObjectInspector));


在程序上方运行所需的Maven依赖关系:

 < dependency> 
< groupId> org.apache.hive< / groupId>
< artifactId> hive-exec< / artifactId>
< version> 2.1.0< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-common< / artifactId>
< version> 2.7.2< / version>
< /依赖关系>


I am using hive 0.13.1 and hashing combination of keys using default hive hash function.

Something like select hash (date,token1,token2, parameters["a"],parameters["b"], parameters["c"]) from table1;

I ran it on 150M rows. For 60% of the rows, it hashed it correctly. For the remaining rows, it gave 0. null or 1 as hash. I looked at the rows which resulted in bad hashes, I don't see anything wrong with the rows. What could be causing it?

解决方案

The hash function returns 0 only when all supplied arguments are blank or null.

If you are familiar with Java then you may check implementation of hash function.

The hash function internally uses ObjectInspectorUtils.hashCode to get the hashCode for the supplied fields, use below java code snippet to test manually this issue:

import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;
public class TestHash 
{
    public static void main( String[] args )
    {
        System.out.println( ObjectInspectorUtils.hashCode(null,PrimitiveObjectInspectorFactory.javaStringObjectInspector) );
        System.out.println( ObjectInspectorUtils.hashCode(new Text(""),PrimitiveObjectInspectorFactory.javaStringObjectInspector) );
    }
}

Maven dependencies required to run above program:

<dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.1.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>

这篇关于Hive哈希函数导致0,null和1,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆