HASH函数的一致性 [英] Consistency of HASH function

查看:30
本文介绍了HASH函数的一致性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个非常简单的问题:BigQuery 的HASH 函数背后隐藏了哪个版本的 CityHash?它总是最新的(今天是 v1.1),还是固定版本?

A pretty simple question: which version of CityHash is hidden behind the HASH function of BigQuery? Is it always the latest (today v1.1), or rather a fixed version?

现在,有点背景.我计划严重依赖 BigQuery 来存储大量数据.从这些数据中,我想第一次计算一些哈希值并存储它(类似于 hashed_value = HASH(CONCAT(column_0, column_1))).到现在为止还挺好.第二次,我想通过诸如 SELECT something FROM [mytable] WHERE hashed_value = HASH(CONCAT('12345', 'foobar')) 之类的请求检索具有给定哈希值的行.我在这里担心的是,CityHash 网页上指定这些功能不应该向后兼容.因此,如果 BigQuery 始终依赖于 CityHash 的最新版本,我将无法在下一次 CityHash 更新后根据某些计算列的哈希值检索我的数据.对于我的应用程序,我的大型数据库基本上将变得毫无用处.

Now, a little bit of backgroud. I plan on relying heavily upon BigQuery to store large sets of data. From those data, in a first time, I would like to compute some hash value and store it (something like hashed_value = HASH(CONCAT(column_0, column_1))). So far so good. In a second time, I would like to retrieve rows with a given hash value with a request such as SELECT something FROM [mytable] WHERE hashed_value = HASH(CONCAT('12345', 'foobar')). My concern here is that it is specified on the CityHash webpage that those functions are not supposed to be backward compatible. So that if BigQuery relies always on the latest version of CityHash, I will not be able to retrieve my data based on the hash value of some computed columns after the next CityHash update. And for my application my large database will essentially become useless.

如果是这样,除了 HASH 之外,是否可以访问固定(或向后兼容)哈希函数?以SHAMD等为例,甚至是CityHash的固定版本.

If so, would it be possible to give access to a fixed (or backward-compatible) hash function, in addition to HASH ? One on the SHA, MD and so on for exemple, or even a fixed version of CityHash.

谢谢.

推荐答案

BigQuery 中使用的 CityHash 是从http://code.google.com/p/cityhash/纵观历史,价值似乎会随着时间而改变.这可能是一个很好的问题:https://groups.google.com/forum/?fromgroups#!forum/cityhash-讨论

CityHash used in BigQuery is the version from http://code.google.com/p/cityhash/ Looking at the history, it seems like the value can change over time. This might be a good question for: https://groups.google.com/forum/?fromgroups#!forum/cityhash-discuss

BigQuery 应该支持一致的哈希.我们确实支持 sha1,但由于编码问题,现在结果无法使用.但是,您可以执行 SELECT TO_BASE64(SHA1(CONCAT('12345', 'foobar')))

BigQuery should support a consistent hash. We do have support for sha1, but right now the result is unusable because of encoding issues. You can, however, do SELECT TO_BASE64(SHA1(CONCAT('12345', 'foobar')))

请注意,我们可能会在不久的将来更改 SHA1 以自动对结果进行 base64 编码.我已提交内部错误以进行此更改.

Note that we will likely change SHA1 in the near future to automatically base64 encode the results. I've filed an internal bug to make this change.

这篇关于HASH函数的一致性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆