如何在JavaScript中为BigQuery实现T-SQL CHECKSUM()? [英] How to Implement T-SQL CHECKSUM() in JavaScript for BigQuery?

查看:186
本文介绍了如何在JavaScript中为BigQuery实现T-SQL CHECKSUM()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要寻找的最终结果是实现

The end result I'm looking for is to implement T-SQL CHECKSUM in BigQuery with a JavaScript UDF. I would settle for having the C/C++ source code to translate but if someone has already done this work then I'd love to use it.

或者,如果有人可以想到一种在Microsoft SQL Server中存储的字符串与BigQuery中的字符串之间创建等效的哈希码的方法,那么这对我也有帮助.

Alternatively, if someone can think of a way to create an equivalent hash code between strings stored in Microsoft SQL Server compared to those in BigQuery then that would help me too.

  • 更新:我已经通过注释中的HABO链接找到了一些源代码,这些注释以T-SQL编写以执行相同的CHECKSUM,但我很难将其转换为JavaScript,而JavaScript本质上无法处理64位整数.我正在处理一些小示例,发现该算法仅对 个字节的低字节有效.
  • 更新2:我对复制此算法感到非常好奇,我可以看到一些确定的模式,但我的大脑无法完成将其提炼成反向工程解决方案的任务.我确实发现BINARY_CHECKSUM()CHECKSUM()返回的内容不同,因此前者所做的工作对后者没有帮助.
  • UPDATE: I've found some source code through HABO's link in the comments which is written in T-SQL to perform the same CHECKSUM but I'm having difficulty converting it to JavaScript which inherently cannot handle 64bit integers. I'm playing with some small examples and have found that the algorithm works on the low nibble of each byte only.
  • UPDATE 2: I got really curious about replicating this algorithm and I can see some definite patterns but my brain isn't up to the task of distilling that into a reverse engineered solution. I did find that BINARY_CHECKSUM() and CHECKSUM() return different things so the work done on the former didn't help me with the latter.

推荐答案

我花了一天的时间进行逆向工程,首先转储单个ASCII字符以及成对的所有结果.这表明每个字符都有其自己独特的"XOR代码",而字母无论大小写都相同.之后,该算法非常简单:将左移4位,并通过查找表中存储的代码进行异或运算.

I spent the day reverse engineering this by first dumping all results for single ASCII characters as well as pairs. This showed that each character has its own distinct "XOR code" and letters have the same one regardless of case. The algorithm was remarkably simple to figure out after that: rotate 4 bits left and xor by the code stored in a lookup table.

var xorcodes = [
    0, 1, 2, 3, 4, 5, 6, 7,
    8, 9, 10, 11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, 21, 22, 23,
    24, 25, 26, 27, 28, 29, 30, 31,
    0, 33, 34, 35, 36, 37, 38, 39,  //  !"#$%&'
    40, 41, 42, 43, 44, 45, 46, 47,  // ()*+,-./
    132, 133, 134, 135, 136, 137, 138, 139,  // 01234567
    140, 141, 48, 49, 50, 51, 52, 53, 54,  // 89:;<=>?@
    142, 143, 144, 145, 146, 147, 148, 149,  // ABCDEFGH
    150, 151, 152, 153, 154, 155, 156, 157,  // IJKLMNOP
    158, 159, 160, 161, 162, 163, 164, 165,  // QRSTUVWX
    166, 167, 55, 56, 57, 58, 59, 60,  // YZ[\]^_`
    142, 143, 144, 145, 146, 147, 148, 149,  // abcdefgh
    150, 151, 152, 153, 154, 155, 156, 157,  // ijklmnop
    158, 159, 160, 161, 162, 163, 164, 165,  // qrstuvwx
    166, 167, 61, 62, 63, 64, 65, 66,  // yz{|}~
];

function rol(x, n) {
    // simulate a rotate shift left (>>> preserves the sign bit)
    return (x<<n) | (x>>>(32-n));
}

function checksum(s) {
    var checksum = 0;
    for (var i = 0; i < s.length; i++) {
        checksum = rol(checksum, 4);

        var c = s.charCodeAt(i);
        var xorcode = 0;
        if (c < xorcodes.length) {
            xorcode = xorcodes[c];
        }
        checksum ^= xorcode;
    }
    return checksum;
};

请参见 https://github.com/neilodonuts/tsql-checksum-javascript了解更多信息.

免责声明:我只使用排序规则设置为SQL_Latin1_General_CP1_CI_AS的SQL Server中与VARCHAR字符串的兼容性.这不适用于多列或整数,但是我确定基础算法使用相同的代码,因此不难发现.由于归类,它似乎也与dbfiddle有所不同: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ...里程可能有所不同!

DISCLAIMER: I've only worked on compatibility with VARCHAR strings in SQL Server with collation set to SQL_Latin1_General_CP1_CI_AS. This won't work with multiple columns or integers but I'm sure the underlying algorithm uses the same codes so it wouldn't be hard to figure out. It also seems to differ from db<>fiddle possibly due to collation: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ... mileage may vary!

这篇关于如何在JavaScript中为BigQuery实现T-SQL CHECKSUM()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆