SQL方法来获取整行的MD5或SHA1 [英] SQL way to get the MD5 or SHA1 of an entire row

查看:192
本文介绍了SQL方法来获取整行的MD5或SHA1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在半便携式"方式来获取整行的md5()或sha1()? (或者更好的是,按它们的所有字段排序的整行,即order by 1,2,3,...,n)?不幸的是,并非所有DB都是PostgreSQL ...至少我必须处理Microsoft SQL Server,Sybase和Oracle.

Is there a "semi-portable" way to get the md5() or the sha1() of an entire row? (Or better, of an entire group of rows ordered by all their fields, i.e. order by 1,2,3,...,n)? Unfortunately not all DBs are PostgreSQL... I have to deal with at least microsoft SQL server, Sybase, and Oracle.

理想情况下,我想拥有一个聚合器(服务器端)并将其用于检测行组中的更改.例如,在具有时间戳记"列的表中,我想为每个月存储一个唯一的签名.然后,我可以快速检测到自上次访问以来发生的几个月变化(我正在将某些表镜像到运行Greenplum的服务器)并重新加载这些表.

Ideally, I'd like to have an aggregator (server side) and use it to detect changes in groups of rows. For example, in tables that have some timestamp column, I'd like to store a unique signature for, say, each month. Then I could quickly detect months that have changed since my last visit (I am mirrorring certain tables to a server running Greenplum) and re-load those.

我查看了一些选项,例如tsql中的checksum(*)(恐怖:它非常容易发生冲突,因为它基于一堆XOR和32位值)和hashbytes('MD5', field),但是后者不能应用于整行.这将为我提供一种仅针对我必须处理的SQL风格的解决方案.

I've looked at a few options, e.g. checksum(*) in tsql (horror: it's very collision-prone, since it's based on a bunch of XORs and 32-bit values), and hashbytes('MD5', field), but the latter can't be applied to an entire row. And that would give me a solution just for one of the SQL flavors I have to deal with.

有什么主意吗?即使只是上面提到的SQL习惯用语之一,那也很棒.

Any idea? Even for just one of the SQL idioms mentioned above, that would be great.

推荐答案

您可以在更新触发器上计算整行的hashbytes值,我将其用作ETL流程的一部分,以前他们在此过程中比较了桌子,速度增加了很多.

You could calculate the hashbytes value for the entire row on an update trigger, I used this as part of an ETL process where previously they were comparing all columns in the tables, the speed increase was huge.

Hashbytes适用于varchar,nvarchar或varbinary数据类型,我想比较整数键和文本字段,将所有内容转换为噩梦,所以我在SQL Server中使用FOR XML子句如下:

Hashbytes works on varchar, nvarchar, or varbinary datatypes, and I wanted to compare integer keys and text fields, casting everything would have been a nightmare, so I used the FOR XML clause in SQL server as follows:

CREATE TRIGGER get_hash_value ON staging_table
FOR UPDATE, INSERT AS  
UPDATE staging_table
SET sha1_hash = (SELECT hashbytes('sha1', (SELECT col1, col2, col3 FOR XML RAW)))
GO

或者,如果您打算通过还使用带有xml子句的子查询对所有行进行许多更新,则可以在触发器之外以类似的方式计算值.如果采用这种方式,您甚至可以将其更改为SELECT *,但不能在触发器中更改,因为每次运行它时,您将获得一个不同的值,因为sha1_hash列每次都将有所不同.

alternatively, you could calculate the values in a similar way outside of a trigger, if you plan to do many updates on all the rows by using a subquery with the for xml clause also. If going this route, you can even change it to a SELECT *, but not in the trigger, as each time you run it you would be getting a different value because the sha1_hash column would be different each time.

您可以修改select语句以获取1行以上

You could modify the select statement to get more than 1 row

这篇关于SQL方法来获取整行的MD5或SHA1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆