我如何获得postgresql中整个表的散列? [英] How can I get a hash of an entire table in postgresql?

查看:116
本文介绍了我如何获得postgresql中整个表的散列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个相当有效的方法将整个表压缩成一个散列值。

我有一些工具可以生成整个数据表,然后可以使用它们生成更多表格等等。我试图实现一个简单的构建系统来协调构建运行并避免重复工作。我希望能够记录输入表的哈希值,以便稍后检查它们是否已更改。构建一张表需要几分钟或几小时,因此花费几秒钟建立哈希值是可以接受的。

我用过的一个黑客就是将pg_dump的输出传递给md5sum,但是需要通过网络传输整个表转储以将其散列在本地框中。理想情况下,我想在数据库服务器上生成哈希。



给我一种方法来一次计算一行的哈希值,然后可以以某种方式组合该哈希值。



任何提示将不胜感激。



编辑发布我最后的结果: tinychen的答案直接对我无效,因为我显然不能使用'plpgsql'。当我在SQL中实现函数时,它工作正常,但对于大型表非常低效。因此,我没有连接所有的行散列,然后散列,而是切换到使用滚动散列,其中前一个散列与行的文本表示串联,然后散列以产生下一个散列。这样好多了;

 创建函数zz_concat(text,text, text)返回文本为
'select md5($ 1 || $ 2);'language'sql';

创建聚合zz_hashagg(text)(
sfunc = zz_concat,
stype = text,
initcond ='');


解决方案

只要这样做就可以创建哈希表聚合函数。

 创建函数pg_concat(text,text)返回文本为'
begin
if $ 1 isnull then
返回$ 2;
else
返回$ 1 || $ 2;
结束if;
end;'language'plpgsql';

创建函数pg_concat_fin(text)返回文本为'
begin
return $ 1;
end;'language'plpgsql';

创建聚合pg_concat(
basetype = text,
sfunc = pg_concat,
stype = text,
finalfunc = pg_concat_fin);

然后您可以使用pg_concat函数来计算表的哈希值。

 选择md5(pg_concat(md5(CAST((f。*)AS text))))from f order by id 


I would like a fairly efficient way to condense an entire table to a hash value.

I have some tools that generate entire data tables, which can then be used to generate further tables, and so on. I'm trying to implement a simplistic build system to coordinate build runs and avoid repeating work. I want to be able to record hashes of the input tables so that I can later check whether they have changed. Building a table takes minutes or hours, so spending several seconds building hashes is acceptable.

A hack I have used is to just pipe the output of pg_dump to md5sum, but that requires transferring the entire table dump over the network to hash it on the local box. Ideally I'd like to produce the hash on the database server.

Finding the hash value of a row in postgresql gives me a way to calculate a hash for a row at a time, which could then be combined somehow.

Any tips would be greatly appreciated.

Edit to post what I ended up with: tinychen's answer didn't work for me directly, because I couldn't use 'plpgsql' apparently. When I implemented the function in SQL instead, it worked, but was very inefficient for large tables. So instead of concatenating all the row hashes and then hashing that, I switched to using a "rolling hash", where the previous hash is concatenated with the text representation of a row and then that is hashed to produce the next hash. This was much better; apparently running md5 on short strings millions of extra times is better than concatenating short strings millions of times.

create function zz_concat(text, text) returns text as 
    'select md5($1 || $2);' language 'sql';

create aggregate zz_hashagg(text) (
    sfunc = zz_concat,
    stype = text,
    initcond = '');

解决方案

just do like this to create a hash table aggregation function.

create function pg_concat( text, text ) returns text as '
begin
    if $1 isnull then
        return $2;
    else
       return $1 || $2;
    end if;
end;' language 'plpgsql';

create function pg_concat_fin(text) returns text as '
begin
    return $1;
end;' language 'plpgsql';

create aggregate pg_concat (
    basetype = text,
    sfunc = pg_concat,
    stype = text,
    finalfunc = pg_concat_fin);

then you could use the pg_concat function to caculate the table's hash value.

select md5(pg_concat(md5(CAST((f.*)AS text)))) from f order by id

这篇关于我如何获得postgresql中整个表的散列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆