对非二进制哈希ID的MySQL二进制 [英] MySQL binary against non-binary for hash IDs

查看:119
本文介绍了对非二进制哈希ID的MySQL二进制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想使用哈希作为ID而不是数字。难道是一个性能上的优势将它们存储为 BINARY 在非二进制?

  CREATE TABLE`test`.`foobar`(
  `id` CHAR(32)二进制字符集ASCII COLLATE ascii_bin NOT NULL,
  PRIMARY KEY(`id`)

CHARACTER SET ascii码;


解决方案

是的。通常散列摘要存储为十六进制数字的ASCII重新presentation,单词'散'的例子MD5是:

  0800fc577294c34e0b28ad2839435945

这是一个32字符的ASCII字符串。

但是MD5真的会产生一个128位二进制的哈希值。这的的只需要16个字节存储为二进制值,而不是十六进制数字。所以,你可以通过使用二进制字符串获得一些空间效率。

  CREATE TABLE test.foobar(
  ID BINARY(16)NOT NULL PRIMARY KEY
);INSERT INTO test.foobar(ID)VALUES(UNHEX(MD5('哈希')));


回复。您的意见,您更关心的是性能比空间效率:

我不知道什么原因令二进制数据类型将会比CHAR更快。

作为一半大可能是一种优势性能,如果您有效地使用缓存缓冲区。即,高速缓冲存储器的一个给定的量可以存储两倍值得BINARY数据的尽可能多的行,如果该字符串是储存在十六进制的值相同所需的CHAR的一半大小。同样,超高速缓冲存储器的索引在该列可以存储两倍。

其结果是更有效的高速缓存中,因为一个随机查询具有击打缓存的数据或索引,而不需要一个磁盘访问的机会较大。缓存效率为最数据库应用很重要,因为通常的瓶颈是磁盘I / O。如果您可以使用缓存以减少磁盘I频率/ O,它的降压不是一种数据类型或另一种的选择,一个更大的爆炸。

对于存储在BINARY与一个BIGINT哈希字符串之间的区别,我会选择BIGINT。缓存效率将会更大,并且还对64位处理器整数算术和比较应该是非常快的。

我没有测量,支持上述说法。比另一个选择一种数据类型的净效益取决于数据模式和类型的数据库和应用程序查询了很多。为了获得最precise答案,你必须尝试这两种解决方案,并衡量的差异。


回复。你的假设:二进制字符串比较是比默认不区分大小写字符串比较快,我试着下面的测试:

  MySQL的> SELECT BENCHMARK(亿,'富'='富');
1行中集(5.13秒)MySQL的> SELECT BENCHMARK(亿,'富'= BINARY'富');
1行中集(4.23秒)

所以二进制字符串比较是比不区分大小写字符串比较快17.5%。但是请注意,这个评估前pression了100万次后,总差价仍然小于1秒。虽然我们可以测量速度的相对差,在速度的绝对差为真的微不足道

所以,我要重申:


  • 测量,而不要猜测或假设。你的猜测将是错误的很多的时间。测量之前和之后的每一个改变你做,所以你知道它有多大的帮助。

  • 投资你的时间和注意你在哪里得到的降压最大的爆炸。

  • 请不要为鸡毛蒜皮的事。当然,一个微小的差异与足够的迭代加起来,但所给出的那些迭代,以更大的绝对获益的性能改进仍然是preferable

Assuming that I want to use a hash as an ID instead of a numeric. Would it be an performance advantage to store them as BINARY over non-binary?

CREATE TABLE `test`.`foobar` (
  `id` CHAR(32) BINARY CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  PRIMARY KEY (`id`)
)
CHARACTER SET ascii;

解决方案

Yes. Often a hash digest is stored as the ASCII representation of hex digits, for example MD5 of the word 'hash' is:

0800fc577294c34e0b28ad2839435945

This is a 32-character ASCII string.

But MD5 really produces a 128-bit binary hash value. This should require only 16 bytes to be stored as binary values instead of hex digits. So you can gain some space efficiency by using binary strings.

CREATE TABLE test.foobar (
  id BINARY(16) NOT NULL PRIMARY KEY
);

INSERT INTO test.foobar (id) VALUES (UNHEX(MD5('hash')));


Re. your comments that you are more concerned about performance than space efficiency:

I don't know of any reason that the BINARY data type would be speedier than CHAR.

Being half as large can be an advantage for performance if you use cache buffers effectively. That is, a given amount of cache memory can store twice as many rows worth of BINARY data if the string is half the size of the CHAR needed to store the same value in hex. Likewise the cache memory for the index on that column can store twice as much.

The result is a more effective cache, because a random query has a greater chance of hitting the cached data or index, instead of requiring a disk access. Cache efficiency is important for most database applications, because usually the bottleneck is disk I/O. If you can use cache memory to reduce frequency of disk I/O, it's a much bigger bang for the buck than the choice between one data type or another.

As for the difference between a hash string stored in BINARY versus a BIGINT, I would choose BIGINT. The cache efficiency will be even greater, and also on 64-bit processors integer arithmetic and comparisons should be very fast.

I don't have measurements to support the claims above. The net benefit of choosing one data type over another depends a lot on data patterns and types of queries in your database and application. To get the most precise answer, you must try both solutions and measure the difference.


Re. your supposition that binary string comparison is quicker than default case-insensitive string comparison, I tried the following test:

mysql> SELECT BENCHMARK(100000000, 'foo' = 'FOO');
1 row in set (5.13 sec)

mysql> SELECT BENCHMARK(100000000, 'foo' = BINARY 'FOO');
1 row in set (4.23 sec)

So binary string comparison is 17.5% faster than case-insensitive string comparison. But notice that after evaluating this expression 100 million times, the total difference is still less than 1 second. While we can measure the relative difference in speed, the absolute difference in speed is really insignificant.

So I'll reiterate:

  • Measure, don't guess or suppose. Your educated guesses will be wrong a lot of the time. Measure before and after every change you make, so you know how much it helped.
  • Invest your time and attention where you get the greatest bang for the buck.
  • Don't sweat the small stuff. Of course, a tiny difference adds up with enough iterations, but given those iterations, a performance improvement with greater absolute benefit is still preferable.

这篇关于对非二进制哈希ID的MySQL二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆