主表最简单的主键? [英] Easiest primary key for main table?

查看:71
本文介绍了主表最简单的主键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的主表Users存储有关用户的信息.我计划将UserId字段用作表的主键.我完全可以控制这些键的创建和分配,并且我想确保以能够提供良好性能的方式分配键.我该怎么办?

My main table, Users, stores information about users. I plan to have a UserId field as the primary key of the table. I have full control of creation and assignment of these keys, and I want to ensure that I assign keys in a way that provides good performance. What should I do?

推荐答案

您有一些选择:

1)最通用的解决方案是使用 RFC 4122 中指定的UUID.

1) The most generic solution is to use UUIDs, as specified in RFC 4122.

例如,您可以具有一个STRING(36),用于存储UUID.或者,您可以将UUID存储为一对INT64BYTE(16).使用UUID有一些陷阱,因此请阅读此 answer 的详细信息.

For example, you could have a STRING(36) that stores UUIDs. Or you could store the UUID as a pair of INT64s or as a BYTE(16). There are some pitfalls to using UUIDs, so read the details of this answer.

2)如果您想节省空间并绝对确定您的用户数将少于几十亿,则可以使用INT64,然后使用随机数生成器分配UserId.您要确保用户数不超过数十亿的原因是生日问题,一旦拥有4B用户,您至少发生一次碰撞的几率约为50%,并且从那里迅速增加.如果您分配一个已经分配给先前用户的UserId,则插入事务将失败,因此您需要为此做好准备(在生成新的随机数后重试该事务).

2) If you want to save a bit of space and are absolutely sure that you will have fewer than a few billion users, then you could use an INT64 and then assign UserIds using a random number generator. The reason you want to be sure you have fewer than a few billion users is because of the Birthday Problem, the odds that you get at least one collision are about 50% once you have 4B users, and they increase very fast from there. If you assign a UserId that has already been assigned to a previous user, then your insertion transaction will fail, so you'll need to be prepared for that (by retrying the transaction after generating a new random number).

3)如果您希望在Users表中有一些列MyColumn作为主键(可能是因为您知道要经常使用此列查找条目),不确定此列会引起热点的趋势(例如,因为它是顺序生成的或基于时间戳生成的),那么您可以选择另外两个选项:

3) If there's some column, MyColumn, in the Users table that you would like to have as primary key (possibly because you know you'll want to look up entries using this column frequently), but you're not sure about the tendency of this column to cause hotspots (say, because it's generated sequentially or based on timestamps), then you two other options:

3a)您可以加密" MyColumn并将其用作主键.用数学术语来说,您可以在键值上使用自同构,这具有使它们混乱加扰的效果,而仍然永远不会多次分配相同的值.在这种情况下,您根本不需要单独存储MyColumn,而只需存储/使用加密版本,并可以在必要时在应用程序代码中对其进行解密.请注意,这种加密不需要是安全的,而只需要保证原始值的位以可逆的方式被充分加扰即可.例如:如果MyColumn的值是按顺序分配的整数,则可以反转MyColumn的位以创建充分加扰的主键.如果您有更有趣的用例,则可以使用 XTEA 这样的加密算法.

3a) You could "encrypt" MyColumn and use this as your primary key. In mathematical terms, you could use an automorphism on the key values, which has the effect of chaotically scrambling them while still never assigning the same value multiple times. In this case, you wouldn't need to store MyColumn separately at all, but rather you would only store/use the encrypted version and could decrypt it when necessary in your application code. Note that this encryption doesn't need to be secure but instead just needs to guarantee that the bits of the original value are sufficiently scrambled in a reversible way. For example: If your values of MyColumn are integers assigned sequentially, you could just reverse the bits of MyColumn to create a sufficiently scrambled primary key. If you have a more interesting use-case, you could use an encryption algorithm like XTEA.

3b)具有复合主键,其中第一部分是ShardId,计算为hash(MyColumn) % numShards,第二部分是MyColumn.通过将行分配给单个拆分,哈希函数将确保您不会创建热点.有关此方法的更多信息,请此处.请注意,尽管md5或sha512是很好的功能,但您无需使用加密哈希. SpookyHash 也是不错的选择.选择正确数量的分片是一个有趣的问题,它取决于实例中节点的数量.它实际上是避免热点的能力(更多的碎片)和读取/扫描效率(更少的碎片)之间的折衷方案.如果您只有3个节点,那么8个分片可能就可以了.如果您有100个节点;那么尝试使用32个分片是合理的值.

3b) Have a compound primary key where the first part is a ShardId, computed ashash(MyColumn) % numShards and the second part is MyColumn. The hash function will ensure that you don't create a hot-spot by allocating your rows to a single split. More information on this approach can be found here. Note that you do not need to use a cryptographic hash, although md5 or sha512 are fine functions. SpookyHash is a good option too. Picking the right number of shards is an interesting question and can depend upon the number of nodes in your instance; it's effectively a trade-off between hotspot-avoiding power (more shards) and read/scan efficiency (fewer shards). If you only have 3 nodes, then 8 shards is probably fine. If you have 100 nodes; then 32 shards is a reasonable value to try.

这篇关于主表最简单的主键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆