HASH功能的一致性 [英] Consistency of HASH function

查看:166
本文介绍了HASH功能的一致性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个非常简单的问题:哪个版本的CityHash隐藏在BigQuery的 HASH 函数后面?它总是最新的(今天v1.1),或者说是一个固定的版本?



现在,有一点背景。我打算严重依赖BigQuery来存储大量数据。从这些数据中,我第一次计算一些散列值并将其存储(类似于 hashed_value = HASH(CONCAT(column_0,column_1)))。到现在为止还挺好。
第二次,我想用一个给定的散列值检索具有请求的行,例如 SELECT something FROM [mytable] WHERE hashed_value = HASH(CONCAT('12345','foobar ))
这里我关心的是它在CityHash网页上指定这些函数不应该向后兼容。因此,如果BigQuery始终依赖CityHash的 latest 版本,那么在下一次CityHash更新后,我将无法根据某些计算列的哈希值检索我的数据。而对于我的应用程序,我的大型数据库基本上变得毫无用处。如果是这样,是否有可能访问固定(或向后兼容)的散列函数,在除了 HASH ?例如 SHA MD 等等,或者甚至是固定版本的CityHash。



谢谢。

解决方案

BigQuery中使用的CityHash是从
http://code.google.com/p/cityhash/
展望在历史上,它的价值似乎随时间而改变。这可能是一个很好的问题:
https:// groups .google.com / forum /?fromgroups#!forum / cityhash-discuss

BigQuery应该支持一致的散列。我们确实支持sha1,但现在由于编码问题,结果无法使用。但是,您可以执行 SELECT TO_BASE64(SHA1(CONCAT('12345','foobar')))



请注意,我们很可能会在不久的将来改变 SHA1 来自动对结果进行base64编码。我已经提交了一个内部错误以进行此更改。


A pretty simple question: which version of CityHash is hidden behind the HASH function of BigQuery? Is it always the latest (today v1.1), or rather a fixed version?

Now, a little bit of backgroud. I plan on relying heavily upon BigQuery to store large sets of data. From those data, in a first time, I would like to compute some hash value and store it (something like hashed_value = HASH(CONCAT(column_0, column_1))). So far so good. In a second time, I would like to retrieve rows with a given hash value with a request such as SELECT something FROM [mytable] WHERE hashed_value = HASH(CONCAT('12345', 'foobar')). My concern here is that it is specified on the CityHash webpage that those functions are not supposed to be backward compatible. So that if BigQuery relies always on the latest version of CityHash, I will not be able to retrieve my data based on the hash value of some computed columns after the next CityHash update. And for my application my large database will essentially become useless.

If so, would it be possible to give access to a fixed (or backward-compatible) hash function, in addition to HASH ? One on the SHA, MD and so on for exemple, or even a fixed version of CityHash.

Thank you.

解决方案

CityHash used in BigQuery is the version from http://code.google.com/p/cityhash/ Looking at the history, it seems like the value can change over time. This might be a good question for: https://groups.google.com/forum/?fromgroups#!forum/cityhash-discuss

BigQuery should support a consistent hash. We do have support for sha1, but right now the result is unusable because of encoding issues. You can, however, do SELECT TO_BASE64(SHA1(CONCAT('12345', 'foobar')))

Note that we will likely change SHA1 in the near future to automatically base64 encode the results. I've filed an internal bug to make this change.

这篇关于HASH功能的一致性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆