使用PRNG分配数据存储区ID [英] Allocating datastore id using PRNG
问题描述
Google Cloud Datastore证明,如果需要预先分配实体ID,则应使用allocateIds
方法:
https://cloud.google.com/datastore/docs/best-practices#键
Google Cloud Datastore documents that if an entity id needs to be pre-allocated, then one should use the allocateIds
method:
https://cloud.google.com/datastore/docs/best-practices#keys
该方法似乎进行了具有延迟的REST或RPC调用.我想通过在Kubernetes Engine应用程序中使用PRNG来避免这种延迟.这是scala代码:
That method seems to make a REST or RPC call which has latency. I'd like to avoid that latency by using a PRNG in my Kubernetes Engine application. Here's the scala code:
import java.security.SecureRandom
class RandomFactory {
protected val r = new SecureRandom
def randomLong: Long = r.nextLong
def randomLong(min: Long, max: Long): Long =
// Unfortunately, Java didn't make Random.internalNextLong public,
// so we have to get to it in an indirect way.
r.longs(1, min, max).toArray.head
// id may be any value in the range (1, MAX_SAFE_INTEGER),
// so that it can be represented in Javascript.
// TODO: randomId is used in production, and might be susceptible to
// TODO: blocking if /dev/random does not contain entropy.
// TODO: Keep an eye on this concern.
def randomId: Long =
randomLong(1, RandomFactory.MAX_SAFE_INTEGER)
}
object RandomFactory extends RandomFactory {
// MAX_SAFE_INTEGER is es6 Number.MAX_SAFE_INTEGER
val MAX_SAFE_INTEGER = 9007199254740991L
}
我还计划在Pod中安装haveged
,以帮助实现熵.
I also plan to install haveged
in the pod to help with entropy.
我了解allocateIds
确保没有使用ID.但是在我的特定用例中,有两个缓解因素可以忽略该问题:
I understand allocateIds
ensures that an ID is not already in use. But in my particular use case, there are two mitigating factors to overlooking that concern:
- 基于实体数量,发生冲突的可能性为1亿分之一.
- 这种特定的实体类型不是必需的,可以承受一次蓝月亮"冲突.
我更关心键空间中的均匀分布,因为这是正常的用例关注.
I am more concerned about even distribution in keyspace, because that is normal use case concern.
这种方法是否行得通,特别是在密钥空间中均匀分布的情况下? allocatedIds
方法是必不可少的,还是只是可以帮助开发人员避免简单的错误?
Will this approach work, particularly with even distribution in keyspace? Is the allocatedIds
method essential, or does it just help developers avoid simple mistakes?
推荐答案
To get rid of collisions use more bits -- for all practical purposes 128 [See statistics behind UUID V4] will never generate a collision.
另一种技术是使用较短的随机数插入新实体,并通过使用新ID再次尝试来处理Cloud Datastore返回的错误(如果它们已经存在)(直到发生当前未使用的实体).
Another technique is to insert new entities with a shorter random number and handle the error Cloud Datastore returns if they already exist by trying again with a new ID (until you happen upon one that isn't currently in use).
就密钥分布而言:密钥将在密钥空间内随机分布,这将使Cloud Datastore感到满意.
As far as the key distribution goes: the keys will be randomly distributed within the key space will keep Cloud Datastore happy.
这篇关于使用PRNG分配数据存储区ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!