HASH功能的一致性 [英] Consistency of HASH function
问题描述
一个非常简单的问题:哪个版本的CityHash隐藏在BigQuery的 HASH
函数后面?它总是最新的(今天v1.1),或者说是一个固定的版本?
现在,有一点背景。我打算严重依赖BigQuery来存储大量数据。从这些数据中,我第一次计算一些散列值并将其存储(类似于 hashed_value = HASH(CONCAT(column_0,column_1))
)。到现在为止还挺好。
第二次,我想用一个给定的散列值检索具有请求的行,例如 SELECT something FROM [mytable] WHERE hashed_value = HASH(CONCAT('12345','foobar ))
。
这里我关心的是它在CityHash网页上指定这些函数不应该向后兼容。因此,如果BigQuery始终依赖CityHash的 latest 版本,那么在下一次CityHash更新后,我将无法根据某些计算列的哈希值检索我的数据。而对于我的应用程序,我的大型数据库基本上变得毫无用处。如果是这样,是否有可能访问固定(或向后兼容)的散列函数,在除了 HASH
?例如 SHA
, MD
等等,或者甚至是固定版本的CityHash。
谢谢。
BigQuery中使用的CityHash是从 BigQuery应该支持一致的散列。我们确实支持sha1,但现在由于编码问题,结果无法使用。但是,您可以执行 请注意,我们很可能会在不久的将来改变 A pretty simple question: which version of CityHash is hidden behind the Now, a little bit of backgroud. I plan on relying heavily upon BigQuery to store large sets of data. From those data, in a first time, I would like to compute some hash value and store it (something like If so, would it be possible to give access to a fixed (or backward-compatible) hash function, in addition to Thank you. CityHash used in BigQuery is the version from
http://code.google.com/p/cityhash/
Looking at the history, it seems like the value can change over time. This might be a good question for:
https://groups.google.com/forum/?fromgroups#!forum/cityhash-discuss BigQuery should support a consistent hash. We do have support for sha1, but right now the result is unusable because of encoding issues. You can, however, do Note that we will likely change 这篇关于HASH功能的一致性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
http://code.google.com/p/cityhash/
展望在历史上,它的价值似乎随时间而改变。这可能是一个很好的问题:
https:// groups .google.com / forum /?fromgroups#!forum / cityhash-discuss
SELECT TO_BASE64(SHA1(CONCAT('12345','foobar')))
SHA1
来自动对结果进行base64编码。我已经提交了一个内部错误以进行此更改。HASH
function of BigQuery? Is it always the latest (today v1.1), or rather a fixed version?hashed_value = HASH(CONCAT(column_0, column_1))
). So far so good.
In a second time, I would like to retrieve rows with a given hash value with a request such as SELECT something FROM [mytable] WHERE hashed_value = HASH(CONCAT('12345', 'foobar'))
.
My concern here is that it is specified on the CityHash webpage that those functions are not supposed to be backward compatible. So that if BigQuery relies always on the latest version of CityHash, I will not be able to retrieve my data based on the hash value of some computed columns after the next CityHash update. And for my application my large database will essentially become useless.HASH
? One on the SHA
, MD
and so on for exemple, or even a fixed version of CityHash.SELECT TO_BASE64(SHA1(CONCAT('12345', 'foobar')))
SHA1
in the near future to automatically base64 encode the results. I've filed an internal bug to make this change.