BigQuery的FARM_FINGERPRINT如何代表64位* unsigned * int? [英] How does BigQuery's FARM_FINGERPRINT represent a 64-bit *unsigned* int?

查看:54
本文介绍了BigQuery的FARM_FINGERPRINT如何代表64位* unsigned * int?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BigQuery方便地包含 FARM_FINGERPRINT 函数.这是文档的摘录为此功能:

BigQuery conveniently includes the FARM_FINGERPRINT function. Here's an excerpt of the documentation for this function:

说明

使用开源FarmHash库中的Fingerprint64函数计算STRING或BYTES输入的指纹.此功能对于特定输入的输出将永远不会改变.

Computes the fingerprint of the STRING or BYTES input using the Fingerprint64 function from the open-source FarmHash library. The output of this function for a particular input will never change.

返回类型

INT64

请注意,返回类型为INT64,在bigquery中为一个64位带符号的int .

Note that the return type is an INT64, which in bigquery is a 64-bit signed int.

但是,如果我们查看 Fingerprint64的实际实现,我们可以在头文件中看到它返回了一个 unsigned 64位int.

However, if we look at the actual implementation of Fingerprint64, we can see right in the header file that it returns an unsigned 64-bit int.

问题 64位无符号int的最大值是64位有符号int的两倍.因此,一半时间,FARM_FINGERPRINT将生成超出BigQuery INT64的可表示范围的输出.在这种情况下,BigQuery会做什么?它以某种方式将 Fingerprint64 的输出转换为适合有符号int的范围,但文档未说明如何.

The problem A 64 bit unsigned int has twice the maximum value of a 64-bit signed int. So half the time, FARM_FINGERPRINT will generate an output that is outside the representable range of a BigQuery INT64. In such cases, what does BigQuery do? Somehow it transform the output of Fingerprint64 to fit into the range of a signed int, but the documentation doesn't say how.

执行此操作的一种方法将只是使值溢出,从而导致该值回绕到有符号int的负范围内.但是,由于 Fingerprint64 是可移植的功能,因此设计似乎很糟糕,因为BigQuery在BigQuery中的输出不同于其他系统中的标准输出.如果存在这种差异,则至少应在记录时注明警告!

One way to do this would just let the value overflow, causing the value to wrap around into the negative range of the signed int. However, as Fingerprint64 is meant to be a portable function, that seems like a poor design, because then its output in BigQuery differs from the standard output in other systems. If this discrepancy exists, it should at least be documented with a big fat warning!

推荐答案

文档说它使用来自开源FarmHash库的Fingerprint64函数",但没有说它与它的功能完全相同.而且由于BigQuery中的int64是带符号的,因此它的值不能与uint64(无符号)相同,因此应用二进制补码以使它们适合以第一个位作为带符号的位.(就像@ElliottBrossard和Conrad Lee找到的一样)

The documentation says it uses "Fingerprint64 function from the open-source FarmHash library" but doesn't say that it's exactly the same function as it is. And since int64 in BigQuery is signed, it can't have the same values than uint64 (unsigned), so Two's complement is applied in order to make them fit taking the first bit as the signed bit. (Just as @ElliottBrossard and Conrad Lee found)

这篇关于BigQuery的FARM_FINGERPRINT如何代表64位* unsigned * int?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆