如何在 Go 中生成固定长度的随机字符串? [英] How to generate a random string of a fixed length in Go?

查看:37
本文介绍了如何在 Go 中生成固定长度的随机字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想要一个随机字符串(大写或小写),没有数字,在 Go 中.执行此操作的最快和最简单的方法是什么?

解决方案

Paul 的解决方案提供了一个简单的,通用解决方案.

问题要求最快和最简单的方法".让我们也解决最快部分.我们将以迭代的方式获得最终的、最快的代码.可以在答案末尾找到对每次迭代的基准测试.

所有解决方案和基准测试代码都可以在 Go Playground 上找到.Playground 上的代码是一个测试文件,而不是一个可执行文件.您必须将其保存到名为 XX_test.go 的文件中,然后使用

运行它

go test -bench .-基准

前言:

<块引用>

如果您只需要一个随机字符串,最快的解决方案不是首选解决方案.为此,保罗的解决方案是完美的.这就是性能是否重要.尽管前 2 个步骤(字节剩余)可能是可以接受的折衷方案:它们确实将性能提高了大约 50%(请参阅II 中的确切数字.基准 部分),并且它们不会显着增加复杂性.

话虽如此,即使您不需要最快的解决方案,通读此答案也可能具有冒险精神和教育意义.

我.改进

1.创世纪(符文)

提醒一下,我们正在改进的原始通用解决方案是:

func init() {rand.Seed(time.Now().UnixNano())}var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")func RandStringRunes(n int) string {b := make([]符文, n)对于 i := 范围 b {b[i] = letterRunes[rand.Intn(len(letterRunes))]}返回字符串(b)}

2.字节

如果选择和组合随机字符串的字符只包含英文字母的大写和小写字母,我们只能使用字节,因为英文字母映射到 UTF-8 中的字节 1 到 1编码(这是 Go 存储字符串的方式).

所以代替:

var 字母 = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

我们可以使用:

var 字母 = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

甚至更好:

const 字母 = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

现在这已经是一个很大的改进:我们可以将它实现为一个 const(有 string 常量,但 没有切片常量).作为额外的收获,表达式 len(letters) 也将是一个 const!(如果 s 是字符串常量,则表达式 len(s) 是常量.)

费用是多少?什么都没有.strings 可以被索引,索引它的字节,完美,正是我们想要的.

我们的下一个目的地是这样的:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";func RandStringBytes(n int) string {b := make([]byte, n)对于 i := 范围 b {b[i] = letterBytes[rand.Intn(len(letterBytes))]}返回字符串(b)}

3.剩余部分

以前的解决方案通过调用rand.Intn 获得一个随机数来指定一个随机字母() 委托给 Rand.Intn() 委托给 Rand.Int31n().

这比 rand.Int63() 产生一个具有 63 个随机位的随机数.

所以我们可以简单地调用 rand.Int63() 并使用除以 len(letterBytes) 后的余数:

func RandStringBytesRmndr(n int) string {b := make([]byte, n)对于 i := 范围 b {b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))]}返回字符串(b)}

这有效并且明显更快,缺点是所有字母的概率不会完全相同(假设 rand.Int63() 产生所有 63 位数字的概率相等).虽然失真非常小,因为52的字母数量比1<<63-1小很多,所以在实践中这完全没问题.>

为了更容易理解这一点:假设您想要一个 0..5 范围内的随机数.使用 3 个随机位,这将产生数字 0..1,其概率是 2..5 范围内的两倍.使用 5 个随机位,0..1 范围内的数字将以 6/32 概率出现,2..5 范围内的数字与 5/32 概率现在更接近期望值.增加位数会使这个变得不那么重要,当达到 63 位时,可以忽略不计.

4.掩蔽

在之前的解决方案的基础上,我们可以通过只使用随机数的最低位来保持字母的均匀分布,因为它需要代表字母的数量.例如,如果我们有 52 个字母,则需要 6 位来表示它:52 = 110100b.所以我们将只使用 rand.Int63() 返回的数字的最低 6 位.并且为了保持字母的平均分配,我们只接受"如果它落在 0..len(letterBytes)-1 范围内,则为数字.如果最低位更大,我们丢弃它并查询一个新的随机数.

注意最低位大于等于len(letterBytes)的几率一般小于0.5(0.25> 平均),这意味着即使是这种情况,重复这个罕见的"case 减少了找不到好数字的机会.n 次重复后,我们仍然没有一个好的索引的机会比 pow(0.5, n) 小得多,这只是一个上限估计.在52个字母的情况下,最低6位不好的几率只有(64-52)/64 = 0.19;这意味着例如在 10 次重复后没有好的数字是 1e-8.

所以这是解决方案:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";常量 (letterIdxBits = 6//6 位表示一个字母索引letterIdxMask = 1<<letterIdxBits - 1//全 1 位,与 letterIdxBits 一样多)func RandStringBytesMask(n int) string {b := make([]byte, n)对于我:= 0;我<n;{如果 idx := int(rand.Int63() & letterIdxMask);idx

5.掩蔽改进

之前的解决方案只使用了 rand.Int63() 返回的 63 个随机位中的最低 6 位.这是一种浪费,因为获取随机位是我们算法中最慢的部分.

如果我们有 52 个字母,这意味着 6 位编码一个字母索引.所以 63 个随机位可以指定 63/6 = 10 个不同的字母索引.让我们使用所有这 10 个:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";常量 (letterIdxBits = 6//6 位表示一个字母索引letterIdxMask = 1<<letterIdxBits - 1//全 1 位,与 letterIdxBits 一样多letterIdxMax = 63/letterIdxBits//适合 63 位的字母索引数量)func RandStringBytesMaskImpr(n int) string {b := make([]byte, n)//一个 rand.Int63() 生成 63 个随机位,足以容纳 letterIdxMax 字母!对于 i, 缓存, 保持 := n-1, rand.Int63(), letterIdxMax;我>=0;{如果保持 == 0 {缓存,保留 = rand.Int63(), letterIdxMax}if idx := int(cache & letterIdxMask);idx

6.来源

掩蔽改进非常好,我们可以改进的不多.我们可以,但不值得这么复杂.

现在让我们找到其他需要改进的地方.随机数的来源.

有一个 crypto/rand 包,它提供了一个Read(b []byte) 函数,所以我们可以使用它通过一次调用获得我们需要的尽可能多的字节.这对性能没有帮助,因为 crypto/rand 实现了加密安全的伪随机数生成器,因此速度要慢得多.

所以让我们坚持使用 math/rand 包.rand.Rand 使用 rand.Source 作为随机位的来源.rand.Source 是一个接口,它指定了一个 Int63() int64 方法:正是我们在最新解决方案中需要和使用的唯一方法.

所以我们真的不需要 rand.Rand(显式的或全局的,共享 rand 包之一),一个 rand.Source 对我们来说已经足够了:

var src = rand.NewSource(time.Now().UnixNano())func RandStringBytesMaskImprSrc(n int) string {b := make([]byte, n)//一个 src.Int63() 生成 63 个随机位,足以容纳 letterIdxMax 字符!对于 i, 缓存, 保持 := n-1, src.Int63(), letterIdxMax;我>=0;{如果保持 == 0 {缓存,保留 = src.Int63(), letterIdxMax}if idx := int(cache & letterIdxMask);idx

另请注意,最后一个解决方案不需要您初始化(播种)math/rand 包的全局 Rand,因为它没有被使用(和我们的rand.Source 已正确初始化/播种).

这里还有一点要注意:math/rand 的包文档说明:

<块引用>

默认的 Source 可以安全地被多个 goroutines 并发使用.

所以默认源比可能通过rand.NewSource()获取的Source慢,因为默认源必须提供并发访问/使用下的安全性,而 rand.NewSource() 不提供此功能(因此它返回的 Source 更有可能更快).

7.使用 strings.Builder

之前的所有解决方案都返回一个 string,其内容首先构建在一个切片中(Genesis 中的 []rune[]byte 在后续解决方案中),然后转换为 string.这个最终的转换必须复制切片的内容,因为 string 值是不可变的,如果转换不会复制,则不能保证字符串的内容不会通过它的原始切片.有关详细信息,请参阅如何将 utf8 字符串转换为 []byte?golang: []byte(string) vs []byte(*string)).

Go 1.10 引入了 strings.Builder. strings.Builder 是一种新类型,我们可以用来构建string 类似于 bytes.Buffer.在内部它使用 []byte 来构建内容,当我们完成后,我们可以使用它的 Builder.String() 方法.但它很酷的地方在于,它无需执行我们刚刚在上面讨论过的副本就可以做到这一点.之所以敢这样做,是因为用于构建字符串内容的字节片没有暴露,所以保证没有人可以无意或恶意修改它来改变产生的不可变"的内容.字符串.

所以我们的下一个想法不是在切片中构建随机字符串,而是在 strings.Builder 的帮助下,所以一旦我们完成,我们可以获得并返回结果而不需要必须复制它.这可能有助于提高速度,而且肯定有助于内存使用和分配.

func RandStringBytesMaskImprSrcSB(n int) string {sb := strings.Builder{}sb.Grow(n)//一个 src.Int63() 生成 63 个随机位,足以容纳 letterIdxMax 字符!对于 i, 缓存, 保持 := n-1, src.Int63(), letterIdxMax;我>=0;{如果保持 == 0 {缓存,保留 = src.Int63(), letterIdxMax}if idx := int(cache & letterIdxMask);idx

请注意,在创建新的 strings.Buidler 后,我们将其命名为 Builder.Grow() 方法,确保它分配一个足够大的内部切片(以避免在我们添加随机字母时重新分配).

8.模仿"strings.Builderunsafe

strings.Builder 在内部 []byte 中构建字符串,就像我们自己做的一样.所以基本上通过 strings.Builder 做它有一些开销,我们切换到 strings.Builder 的唯一目的是避免切片的最终复制.

strings.Builder 通过使用包unsafe:

//String 返回累积的字符串.func (b *Builder) String() 字符串 {返回 *(*string)(unsafe.Pointer(&b.buf))}

问题是,我们也可以自己做.所以这里的想法是切换回在 []byte 中构建随机字符串,但是当我们完成后,不要将其转换为 string 以返回,但是做一个不安全的转换:获得一个 string ,它指向我们的字节片作为字符串数据.

这是如何做到的:

func RandStringBytesMaskImprSrcUnsafe(n int) string {b := make([]byte, n)//一个 src.Int63() 生成 63 个随机位,足以容纳 letterIdxMax 字符!对于 i, 缓存, 保持 := n-1, src.Int63(), letterIdxMax;我>=0;{如果保持 == 0 {缓存,保留 = src.Int63(), letterIdxMax}if idx := int(cache & letterIdxMask);idx

(9. 使用 rand.Read())

添加了 Go 1.7 a rand.Read() 函数和一个 Rand.Read() 方法.我们应该尝试使用这些在一步中读取尽可能多的字节,以实现更好的性能.

有一个小问题"与此:我们需要多少字节?我们可以说:与输出字母的数量一样多.我们会认为这是一个上限估计,因为字母索引使用少于 8 位(1 字节).但在这一点上,我们已经做得更糟了(因为获取随机位是困难的部分"),而且我们得到的比需要的更多.

另请注意,为了保持所有字母索引的均匀分布,可能会有一些垃圾"索引.我们将无法使用的随机数据,因此我们最终会跳过一些数据,因此当我们遍历所有字节切片时最终会很短.我们需要进一步递归地"获得更多随机字节.现在我们甚至失去了对 rand 包的单一调用";优势...

我们可以有点"优化我们从 math.Rand() 获取的随机数据的使用.我们可以估计需要多少字节(位).1 个字母需要 letterIdxBits 位,我们需要 n 个字母,所以我们需要 n * letterIdxBits/8.0 个字节向上取整.我们可以计算一个随机索引不可用的概率(见上文),所以我们可以请求更多更有可能"的索引.足够了(如果事实证明不是,我们重复这个过程).我们可以将字节切片作为位流"进行处理.例如,我们有一个不错的 3rd 方库:github.com/icza/bitio(披露:我是作者).

但 Benchmark 代码仍然显示我们没有获胜.为什么会这样?

最后一个问题的答案是因为 rand.Read() 使用循环并不断调用 Source.Int63() 直到它填满传递的切片.正是 RandStringBytesMaskImprSrc() 解决方案所做的,没有中间缓冲区,并且没有增加的复杂性.这就是 RandStringBytesMaskImprSrc() 保持在宝座上的原因.是的,与 rand.Read() 不同,RandStringBytesMaskImprSrc() 使用非同步的 rand.Source.但推理仍然适用;如果我们使用 Rand.Read() 而不是 rand.Read()(前者也是不同步的),这就证明了这一点.

二.基准

好吧,是时候对不同的解决方案进行基准测试了.

关键时刻:

BenchmarkRunes-4 2000000 723 ns/op 96 B/op 2 allocs/opBenchmarkBytes-4 3000000 550 ns/op 32 B/op 2 allocs/opBenchmarkBytesRmndr-4 3000000 438 ns/op 32 B/op 2 allocs/opBenchmarkBytesMask-4 3000000 534 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImpr-4 10000000 176 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImprSrc-4 10000000 139 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImprSrcSB-4 10000000 134 ns/op 16 B/op 1 allocs/opBenchmarkBytesMaskImprSrcUnsafe-4 10000000 115 ns/op 16 B/op 1 allocs/op

只需从符文切换到字节,我们立即获得了24%的性能提升,并且内存需求下降到三分之一.

去掉 rand.Intn() 并使用 rand.Int63() 反而会得到另一个 20% 的提升.

屏蔽(在大索引的情况下重复)会稍微减慢(由于重复调用):-22%...

但是当我们使用全部(或大部分)63 个随机位(来自一个 rand.Int63() 调用的 10 个索引)时:这大大加快了速度:3 倍.

如果我们使用(非默认的、新的)rand.Source 而不是 rand.Rand,我们将再次获得 21%.

如果我们使用strings.Builder,我们在速度上获得了微小的3.5%,但我们也达到了50% 减少内存使用和分配!真好!

最后,如果我们敢于使用 unsafe 包而不是 strings.Builder,我们将再次获得不错的 14%.

将最终解决方案与初始解决方案进行比较:RandStringBytesMaskImprSrcUnsafe()RandStringRunes()6.3 倍,使用 六分之一 内存和 一半的分配.任务完成.

I want a random string of characters only (uppercase or lowercase), no numbers, in Go. What is the fastest and simplest way to do this?

解决方案

Paul's solution provides a simple, general solution.

The question asks for the "the fastest and simplest way". Let's address the fastest part too. We'll arrive at our final, fastest code in an iterative manner. Benchmarking each iteration can be found at the end of the answer.

All the solutions and the benchmarking code can be found on the Go Playground. The code on the Playground is a test file, not an executable. You have to save it into a file named XX_test.go and run it with

go test -bench . -benchmem

Foreword:

The fastest solution is not a go-to solution if you just need a random string. For that, Paul's solution is perfect. This is if performance does matter. Although the first 2 steps (Bytes and Remainder) might be an acceptable compromise: they do improve performance by like 50% (see exact numbers in the II. Benchmark section), and they don't increase complexity significantly.

Having said that, even if you don't need the fastest solution, reading through this answer might be adventurous and educational.

I. Improvements

1. Genesis (Runes)

As a reminder, the original, general solution we're improving is this:

func init() {
    rand.Seed(time.Now().UnixNano())
}

var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func RandStringRunes(n int) string {
    b := make([]rune, n)
    for i := range b {
        b[i] = letterRunes[rand.Intn(len(letterRunes))]
    }
    return string(b)
}

2. Bytes

If the characters to choose from and assemble the random string contains only the uppercase and lowercase letters of the English alphabet, we can work with bytes only because the English alphabet letters map to bytes 1-to-1 in the UTF-8 encoding (which is how Go stores strings).

So instead of:

var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

we can use:

var letters = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

Or even better:

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

Now this is already a big improvement: we could achieve it to be a const (there are string constants but there are no slice constants). As an extra gain, the expression len(letters) will also be a const! (The expression len(s) is constant if s is a string constant.)

And at what cost? Nothing at all. strings can be indexed which indexes its bytes, perfect, exactly what we want.

Our next destination looks like this:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func RandStringBytes(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letterBytes[rand.Intn(len(letterBytes))]
    }
    return string(b)
}

3. Remainder

Previous solutions get a random number to designate a random letter by calling rand.Intn() which delegates to Rand.Intn() which delegates to Rand.Int31n().

This is much slower compared to rand.Int63() which produces a random number with 63 random bits.

So we could simply call rand.Int63() and use the remainder after dividing by len(letterBytes):

func RandStringBytesRmndr(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))]
    }
    return string(b)
}

This works and is significantly faster, the disadvantage is that the probability of all the letters will not be exactly the same (assuming rand.Int63() produces all 63-bit numbers with equal probability). Although the distortion is extremely small as the number of letters 52 is much-much smaller than 1<<63 - 1, so in practice this is perfectly fine.

To make this understand easier: let's say you want a random number in the range of 0..5. Using 3 random bits, this would produce the numbers 0..1 with double probability than from the range 2..5. Using 5 random bits, numbers in range 0..1 would occur with 6/32 probability and numbers in range 2..5 with 5/32 probability which is now closer to the desired. Increasing the number of bits makes this less significant, when reaching 63 bits, it is negligible.

4. Masking

Building on the previous solution, we can maintain the equal distribution of letters by using only as many of the lowest bits of the random number as many is required to represent the number of letters. So for example if we have 52 letters, it requires 6 bits to represent it: 52 = 110100b. So we will only use the lowest 6 bits of the number returned by rand.Int63(). And to maintain equal distribution of letters, we only "accept" the number if it falls in the range 0..len(letterBytes)-1. If the lowest bits are greater, we discard it and query a new random number.

Note that the chance of the lowest bits to be greater than or equal to len(letterBytes) is less than 0.5 in general (0.25 on average), which means that even if this would be the case, repeating this "rare" case decreases the chance of not finding a good number. After n repetition, the chance that we still don't have a good index is much less than pow(0.5, n), and this is just an upper estimation. In case of 52 letters the chance that the 6 lowest bits are not good is only (64-52)/64 = 0.19; which means for example that chances to not have a good number after 10 repetition is 1e-8.

So here is the solution:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
    letterIdxBits = 6                    // 6 bits to represent a letter index
    letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
)

func RandStringBytesMask(n int) string {
    b := make([]byte, n)
    for i := 0; i < n; {
        if idx := int(rand.Int63() & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i++
        }
    }
    return string(b)
}

5. Masking Improved

The previous solution only uses the lowest 6 bits of the 63 random bits returned by rand.Int63(). This is a waste as getting the random bits is the slowest part of our algorithm.

If we have 52 letters, that means 6 bits code a letter index. So 63 random bits can designate 63/6 = 10 different letter indices. Let's use all those 10:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
    letterIdxBits = 6                    // 6 bits to represent a letter index
    letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
    letterIdxMax  = 63 / letterIdxBits   // # of letter indices fitting in 63 bits
)

func RandStringBytesMaskImpr(n int) string {
    b := make([]byte, n)
    // A rand.Int63() generates 63 random bits, enough for letterIdxMax letters!
    for i, cache, remain := n-1, rand.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = rand.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i--
        }
        cache >>= letterIdxBits
        remain--
    }

    return string(b)
}

6. Source

The Masking Improved is pretty good, not much we can improve on it. We could, but not worth the complexity.

Now let's find something else to improve. The source of random numbers.

There is a crypto/rand package which provides a Read(b []byte) function, so we could use that to get as many bytes with a single call as many we need. This wouldn't help in terms of performance as crypto/rand implements a cryptographically secure pseudorandom number generator so it's much slower.

So let's stick to the math/rand package. The rand.Rand uses a rand.Source as the source of random bits. rand.Source is an interface which specifies a Int63() int64 method: exactly and the only thing we needed and used in our latest solution.

So we don't really need a rand.Rand (either explicit or the global, shared one of the rand package), a rand.Source is perfectly enough for us:

var src = rand.NewSource(time.Now().UnixNano())

func RandStringBytesMaskImprSrc(n int) string {
    b := make([]byte, n)
    // A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
    for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i--
        }
        cache >>= letterIdxBits
        remain--
    }

    return string(b)
}

Also note that this last solution doesn't require you to initialize (seed) the global Rand of the math/rand package as that is not used (and our rand.Source is properly initialized / seeded).

One more thing to note here: package doc of math/rand states:

The default Source is safe for concurrent use by multiple goroutines.

So the default source is slower than a Source that may be obtained by rand.NewSource(), because the default source has to provide safety under concurrent access / use, while rand.NewSource() does not offer this (and thus the Source returned by it is more likely to be faster).

7. Utilizing strings.Builder

All previous solutions return a string whose content is first built in a slice ([]rune in Genesis, and []byte in subsequent solutions), and then converted to string. This final conversion has to make a copy of the slice's content, because string values are immutable, and if the conversion would not make a copy, it could not be guaranteed that the string's content is not modified via its original slice. For details, see How to convert utf8 string to []byte? and golang: []byte(string) vs []byte(*string).

Go 1.10 introduced strings.Builder. strings.Builder is a new type we can use to build contents of a string similar to bytes.Buffer. Internally it uses a []byte to build the content, and when we're done, we can obtain the final string value using its Builder.String() method. But what's cool in it is that it does this without performing the copy we just talked about above. It dares to do so because the byte slice used to build the string's content is not exposed, so it is guaranteed that no one can modify it unintentionally or maliciously to alter the produced "immutable" string.

So our next idea is to not build the random string in a slice, but with the help of a strings.Builder, so once we're done, we can obtain and return the result without having to make a copy of it. This may help in terms of speed, and it will definitely help in terms of memory usage and allocations.

func RandStringBytesMaskImprSrcSB(n int) string {
    sb := strings.Builder{}
    sb.Grow(n)
    // A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
    for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            sb.WriteByte(letterBytes[idx])
            i--
        }
        cache >>= letterIdxBits
        remain--
    }

    return sb.String()
}

Do note that after creating a new strings.Buidler, we called its Builder.Grow() method, making sure it allocates a big-enough internal slice (to avoid reallocations as we add the random letters).

8. "Mimicing" strings.Builder with package unsafe

strings.Builder builds the string in an internal []byte, the same as we did ourselves. So basically doing it via a strings.Builder has some overhead, the only thing we switched to strings.Builder for is to avoid the final copying of the slice.

strings.Builder avoids the final copy by using package unsafe:

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))
}

The thing is, we can also do this ourselves, too. So the idea here is to switch back to building the random string in a []byte, but when we're done, don't convert it to string to return, but do an unsafe conversion: obtain a string which points to our byte slice as the string data.

This is how it can be done:

func RandStringBytesMaskImprSrcUnsafe(n int) string {
    b := make([]byte, n)
    // A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
    for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i--
        }
        cache >>= letterIdxBits
        remain--
    }

    return *(*string)(unsafe.Pointer(&b))
}

(9. Using rand.Read())

Go 1.7 added a rand.Read() function and a Rand.Read() method. We should be tempted to use these to read as many bytes as we need in one step, in order to achieve better performance.

There is one small "problem" with this: how many bytes do we need? We could say: as many as the number of output letters. We would think this is an upper estimation, as a letter index uses less than 8 bits (1 byte). But at this point we are already doing worse (as getting the random bits is the "hard part"), and we're getting more than needed.

Also note that to maintain equal distribution of all letter indices, there might be some "garbage" random data that we won't be able to use, so we would end up skipping some data, and thus end up short when we go through all the byte slice. We would need to further get more random bytes, "recursively". And now we're even losing the "single call to rand package" advantage...

We could "somewhat" optimize the usage of the random data we acquire from math.Rand(). We may estimate how many bytes (bits) we'll need. 1 letter requires letterIdxBits bits, and we need n letters, so we need n * letterIdxBits / 8.0 bytes rounding up. We can calculate the probability of a random index not being usable (see above), so we could request more that will "more likely" be enough (if it turns out it's not, we repeat the process). We can process the byte slice as a "bit stream" for example, for which we have a nice 3rd party lib: github.com/icza/bitio (disclosure: I'm the author).

But Benchmark code still shows we're not winning. Why is it so?

The answer to the last question is because rand.Read() uses a loop and keeps calling Source.Int63() until it fills the passed slice. Exactly what the RandStringBytesMaskImprSrc() solution does, without the intermediate buffer, and without the added complexity. That's why RandStringBytesMaskImprSrc() remains on the throne. Yes, RandStringBytesMaskImprSrc() uses an unsynchronized rand.Source unlike rand.Read(). But the reasoning still applies; and which is proven if we use Rand.Read() instead of rand.Read() (the former is also unsynchronzed).

II. Benchmark

All right, it's time for benchmarking the different solutions.

Moment of truth:

BenchmarkRunes-4                     2000000    723 ns/op   96 B/op   2 allocs/op
BenchmarkBytes-4                     3000000    550 ns/op   32 B/op   2 allocs/op
BenchmarkBytesRmndr-4                3000000    438 ns/op   32 B/op   2 allocs/op
BenchmarkBytesMask-4                 3000000    534 ns/op   32 B/op   2 allocs/op
BenchmarkBytesMaskImpr-4            10000000    176 ns/op   32 B/op   2 allocs/op
BenchmarkBytesMaskImprSrc-4         10000000    139 ns/op   32 B/op   2 allocs/op
BenchmarkBytesMaskImprSrcSB-4       10000000    134 ns/op   16 B/op   1 allocs/op
BenchmarkBytesMaskImprSrcUnsafe-4   10000000    115 ns/op   16 B/op   1 allocs/op

Just by switching from runes to bytes, we immediately have 24% performance gain, and memory requirement drops to one third.

Getting rid of rand.Intn() and using rand.Int63() instead gives another 20% boost.

Masking (and repeating in case of big indices) slows down a little (due to repetition calls): -22%...

But when we make use of all (or most) of the 63 random bits (10 indices from one rand.Int63() call): that speeds up big time: 3 times.

If we settle with a (non-default, new) rand.Source instead of rand.Rand, we again gain 21%.

If we utilize strings.Builder, we gain a tiny 3.5% in speed, but we also achieved 50% reduction in memory usage and allocations! That's nice!

Finally if we dare to use package unsafe instead of strings.Builder, we again gain a nice 14%.

Comparing the final to the initial solution: RandStringBytesMaskImprSrcUnsafe() is 6.3 times faster than RandStringRunes(), uses one sixth memory and half as few allocations. Mission accomplished.

这篇关于如何在 Go 中生成固定长度的随机字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆