校验和比MD5更短 [英] Shorter checksum than MD5

查看:131
本文介绍了校验和比MD5更短的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好


我正在寻找一种简单的方法来校验我的数据。

每条记录的数据长度为70字节,所以32

byte hex md5sum会大大增加我的

mysql数据库的大小。


我正在寻找一些东西是5个字节长,

目前我只是拿了六角形的一部分

md5总和(像这样:checksum = md5sum [3:8]) 。我没有任何重复,我有超过100000

的记录,但我对未来不确定...

有人能给我更好的东西吗?或者指点一下

到某个网站?

thx !!

PS:我用这个校验和定期比较2
$ b这个数据库的$ b版本,位于

慢速互联网连接的两侧。我希望在2台服务器之间保持不必要的流量。

Hello

i''m looking for a simple way to checksum my data.
The data is 70 bytes long per record, so a 32
byte hex md5sum would increase the size of my
mysql db a lot.

I''m looking for something that is 5 bytes long,
for the moment i''m just taking a part of the hex
md5 sum (like this: checksum = md5sum[3:8]). I
don''t have any duplicates, and I have over 100000
records, but i''m not sure for the future...
Can anybody give me something better? Or point me
to some website?
thx!!
PS: I use this checksum to periodically compare 2
versions of this DB, which are on 2 sides of a
slow internet connection. My hope is to keep down
unneeded traffic between the 2 servers.

推荐答案

----- BEGIN PGP SIGNED消息-----

哈希:SHA1





Mercuro写道:

|我正在寻找一种简单的方法来校验我的数据。数据是70字节

|每条记录长,所以一个32字节的十六进制md5sum会增加我的

|的大小mysql db很多。

|

|我正在寻找5字节长的东西,目前我只是

|取一部分十六进制md5总和(像这样:checksum = md5sum [3:8])。我

|没有任何重复,我有超过100000条记录,但我不是

|确定未来...

|

|

|谁能给我更好的东西?或者指向一些网站?


您可以使用binascii.crc32()生成一个4字节的校验和。


~来自python docs:

crc32(data [,crc])


计算CRC-32,32位校验和数据,以初始crc开头。 br />
这与ZIP文件校验和一致。由于该算法设计用作校验和算法,因此不适合用作

一般哈希算法。使用方法如下:


~print binascii.crc32(" hello world")

~#或者,分为两部分:

~crc = binascii.crc32(" hello")

~crc = binascii.crc32(" world",crc)

~print crc


干杯,

~Joachim


- -

Joachim Bauch

struktur AG, jo**@struktur.de

----- BEGIN PGP SIGNATURE -----

版本:GnuPG v1.2.4(MingW32)

评论:使用GnuPG和Mozilla - http://enigmail.mozdev.org

iD8DBQFBQBpEvb5cTc087cURAqBCAJ9CiSYI57djBUHRDweG7U 0efIfR2wCfXIu5

e81vPBECKZh + wRVSn2jjFYo =

= uObd

----- END PGP SIGNATURE -----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Mercuro wrote:
| i''m looking for a simple way to checksum my data. The data is 70 bytes
| long per record, so a 32 byte hex md5sum would increase the size of my
| mysql db a lot.
|
| I''m looking for something that is 5 bytes long, for the moment i''m just
| taking a part of the hex md5 sum (like this: checksum = md5sum[3:8]). I
| don''t have any duplicates, and I have over 100000 records, but i''m not
| sure for the future...
|
|
| Can anybody give me something better? Or point me to some website?

You could use binascii.crc32() which generates a 4 byte checksum.

~From the python docs:
crc32( data[, crc])

Compute CRC-32, the 32-bit checksum of data, starting with an initial crc.
This is consistent with the ZIP file checksum. Since the algorithm is
designed for use as a checksum algorithm, it is not suitable for use as a
general hash algorithm. Use as follows:

~ print binascii.crc32("hello world")
~ # Or, in two pieces:
~ crc = binascii.crc32("hello")
~ crc = binascii.crc32(" world", crc)
~ print crc

Cheers,
~ Joachim

- --
Joachim Bauch
struktur AG, jo**@struktur.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBQBpEvb5cTc087cURAqBCAJ9CiSYI57djBUHRDweG7U 0efIfR2wCfXIu5
e81vPBECKZh+wRVSn2jjFYo=
=uObd
-----END PGP SIGNATURE-----


Mercuro< ;第** @ is.invalid>写道:
Mercuro <th**@is.invalid> writes:
我正在寻找5字节长的东西,目前我只是拿了一部分的十六进制md5总和(像这样:校验和=
md5sum [3:8])。我没有任何重复,我有超过100000的记录,但我对未来不确定...有人可以给我更好的东西吗?或者指向一些网站?
I''m looking for something that is 5 bytes long, for the moment i''m
just taking a part of the hex md5 sum (like this: checksum =
md5sum[3:8]). I don''t have any duplicates, and I have over 100000
records, but i''m not sure for the future... Can anybody give me
something better? Or point me to some website?




为了制作更小的哈希,我认为你的方法很好,因为

你可以轻松增加如果你需要的长度。


为了比较两个数据库,可能还有其他选项不使用

哈希(例如记录哪些记录自从

进行最后的比较后发生了变化。


-

Brian Gough


Network Theory Ltd,

发布Python手册--- http://www.network-theory.co.uk/python/



For making a smaller hash, I think your approach is a good one since
you can easily increase the length if you need to.

For comparing two databases, maybe there are other options not using a
hash though (e.g. keeping a log of which records have changed since
the last comparison).

--
Brian Gough

Network Theory Ltd,
Publishing the Python Manuals --- http://www.network-theory.co.uk/python/


Mercuro< th ** @ is.invalid>写道:
Mercuro <th**@is.invalid> writes:
我正在寻找一种简单的方法来校验我的数据。每条记录的数据长度为70字节,因此32字节的十六进制md5sum会大大增加我的mysql数据库的大小。


如果数据是二进制的,则md5校验和是16个字节,而不是32个。

我正在寻找5字节长的东西,因为我只是占用了十六进制md5总和的那一刻(就像这样:校验和=
md5sum [3:8])。我没有任何重复,我有超过100000
记录,但我不确定将来...


使用5个十六进制数字会只给你20位散列,所以你会好几乎肯定会碰到那么多记录。

PS:我用这个校验和来定期比较这个数据库的2个版本,
这是慢速互联网连接的两面。我希望能够减少两台服务器之间不必要的流量。
i''m looking for a simple way to checksum my data. The data is 70 bytes
long per record, so a 32 byte hex md5sum would increase the size of my
mysql db a lot.
If the data is binary, the md5 checksum is 16 bytes, not 32.
I''m looking for something that is 5 bytes long, for the moment i''m
just taking a part of the hex md5 sum (like this: checksum =
md5sum[3:8]). I don''t have any duplicates, and I have over 100000
records, but i''m not sure for the future...
Using 5 hex digits would give you just 20 bits of hash, so you would
almost definitely get collisions with that many records.
PS: I use this checksum to periodically compare 2 versions of this DB,
which are on 2 sides of a slow internet connection. My hope is to
keep down unneeded traffic between the 2 servers.




如何在每条记录中加上时间戳,所以你只需要

比较自上一期间以来已更新的记录

比较。


或者,如果您只想偶尔更改,可以比较哈希的长期记录,然后缩小比较范围以找到实际不同的

记录。你可以直接在哈希上放一棵树

结构,但也许还有更好的方法。



How about putting a timestamp in each record, so you only have to
compare the records that have been updated since the last period
comparison.

Or, if you expect only occasional changes, you could compare hashes of
long runs of records, then narrow down the comparisons to locate the
records that actually differ. You could straightforwardly put a tree
structure over the hashes, but maybe there''s some even better way.


这篇关于校验和比MD5更短的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆