如何在golang中生成一个固定长度的随机字符串? [英] How to generate a random string of a fixed length in golang?

查看:251
本文介绍了如何在golang中生成一个固定长度的随机字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只需要一个随机的字符串(大写或小写),在Golang中没有数字。在Go中执行此操作的最快速和最简单的方法是什么?

解决方案

Paul的解决方案提供了一个简单而通用的解决方案。 >

这个问题要求最快和最简单的方法。我们来解决这个问题。我们将以迭代的方式得出最终的,最快的代码。基准每次迭代都可以在答案的最后找到。



所有解决方案和基准代码都可以在 Go Playground 。 Playground上的代码是一个测试文件,不是可执行文件。您必须将其保存到名为 XX_test.go 的文件中,并使用 go test -bench。运行它。



I。改进



1。 Genesis(Runes)



值得提醒的是,我们正在改进的最初的一般解决方案是:

<$ p

$ b var letterRunes = [] $ c $ func init(){
rand.Seed(time.Now()。UnixNano())
} rune(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)

func RandStringRunes(n int)string {
b:= make([] rune,n)
for i:= range b {
b [i] = letterRunes [rand.Intn(len(letterRunes))]
}
返回字符串(b)
}



2。字节



如果从随机字符串中选择和组合字符只包含英文字母的大写和小写字母,我们可以只使用字节,因为英文字母映射到UTF-8编码中的字节1到1(这是Go存储字符串的方式)。



所以不是:

  var letters = [] rune(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)

我们可以使用:
$ b

  var letters = [] bytes(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)

甚至更好:

  const letters =abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

现在这已经是一个很大的改进:我们可以实现它是 const (有 string 常量,但没有切片常量)。作为额外的收益,表达式 len(letters)也将是一个 const ! (如果 s 是一个字符串常量,则表达式 len(s)是常量。)



而且费用是多少?一点都没有。 字符串 s可以索引,索引它的字节,完美,正是我们想要的。



我们的下一个目的地看起来像

  const letterBytes =abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

func RandStringBytes(n int)string {
b:= make([] byte,n)
for i:= range b {
b [i] = letterBytes [rand.Intn(len(letterBytes))]
}
返回字符串(b)
}



3。剩余部分



以前的解决方案通过调用 rand.Intn() 代表 Rand.Intn() ,它代表 Rand.Int31n()

这比 rand.Int63() ,它会产生一个随机数,包含63个随机位。



所以我们可以简单地调用 rand.Int63 ()并使用除以 len(letterBytes)之后的余数:

  func RandStringBytesRmndr(n int)string {
b:= make([] byte,n)
for i:= range b {
b [i] = letterBytes [rand.Int63()%int64(len(letterBytes))]
}
返回字符串(b)
}

这种方法运行速度快得多,缺点是所有字母的概率不会完全相同(假设 rand.Int63()以相等的概率产生所有63位数字)。尽管由于字母 52 的数量远小于 1 <63-1 ,所以在实践中这是完全正常的。



为了让这一点更容易理解:假设您想要一个在 0..5 。使用3个随机位,这将产生具有双重概率的数字 0..1 ,比 2..5 。使用5个随机位, 0..1 范围内的数字将出现在 6/32 范围内的概率和数字 2..5 ,其中 5/32 现在更接近所需的概率。当达到63位时,增加位数使得这一点不太重要,它可以忽略不计。

4。掩盖



在前面的解决方案的基础上,我们可以通过仅使用随机数的尽可能多的最低位来维持字母的平均分配,字母数。例如,如果我们有52个字母,它需要6位来表示它: 52 = 110100b 。所以我们只使用由 rand.Int63()返回的数字的最低6位。为了保持字母的平均分配,如果它落入 0..len(letterBytes)-1 范围内,我们只接受该数字。如果最低位比较大,我们就丢弃它并查询一个新的随机数。



请注意,最低位的可能性大于或等于 len(letterBytes)一般小于 0.5 (平均< 0.25 > ),这意味着即使情况如此,重复这种罕见情况也会减少找不到好数字的机会。在 n 重复之后,我们的基金没有一个好的指数的机会远远小于 pow(0.5,n),这只是一个较高的估计。在52个字母的情况下,6个最低位不好的可能性只有(64-52)/ 64 = 0.19 ;这意味着例如在10次重复之后没有好数字的可能性是 1e-8



是解决方案:

  const letterBytes =abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
const(
letterIdxBits = 6 // 6位表示字母索引
letterIdxMask = 1<<< letterIdxBits - 1 //所有1位,与letterIdxBits一样多


func RandStringBytesMask(n int)字符串{
b:= make([] byte,n)
for i:= 0;我< N; {
if idx:= int(rand.Int63()& letterIdxMask); idx< len(letterBytes){
b [i] = letterBytes [idx]
i ++
}
}
返回字符串(b)
}



5。掩码改进



以前的解决方案只使用由 rand.Int63()。这是浪费,因为获得随机位是算法中最慢的部分。



如果我们有52个字母,这意味着6位编码一个字母索引。所以63个随机位可以指定 63/6 = 10 不同的字母索引。让我们使用所有这些10:

  const letterBytes =abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
const(
letterIdxBits = 6 / / 6位表示一个字母索引
letterIdxMask = 1 letterIdxMax = 63 / letterIdxBits //符合63 bit


func RandStringBytesMaskImpr(n int)string {
b:= make([] byte,n)
// rand.Int63()产生63随机位,足够用于letterIdxMax字母!
for i,cache,remain:= n-1,rand.Int63(),letterIdxMax; i> = 0; {
if == 0 {
cache,remaining = rand.Int63(),letterIdxMax
}
if idx:= int(cache& letterIdxMask); idx< len(letterBytes){
b [i] = letterBytes [idx]
i--
}
缓存>>> = letterIdxBits
remaining--
}

返回字符串(b)
}



6 。来源



相当不错,我们可以改进的不多。我们可以,但不值得复杂。



现在让我们找到其他可以改进的地方。 随机数字的来源



有一个 crypto / rand 这个包提供了一个 Read(b [] byte) 函数,所以我们可以使用它来获取一个单一调用的字节数为我们需要的很多。这对性能没有帮助,因为 crypto / rand 实现了一个密码安全的伪随机数生成器,因此速度要慢得多。



所以让我们坚持 math / rand 包。 rand.Rand 使用 rand.Source 作为随机位的来源。 rand.Source 是一个指定 Int63()int64 方法的接口:它是我们唯一需要和使用的方法在我们的最新解决方案中。



因此,我们并不需要 rand.Rand (显式或全局,共享一个 rand 包),一个 rand.Source 对我们来说已经足够了:

  var src = rand.NewSource(time.Now()。UnixNano())

func RandStringBytesMaskImprSrc(n int)字符串{
b:= make([] byte,n)
// src.Int63()产生63个随机位,足够用于letterIdxMax字符!
for i,cache,remain:= n-1,src.Int63(),letterIdxMax; i> = 0; {
if == 0 {
cache,remain = src.Int63(),letterIdxMax
}
if idx:= int(cache& letterIdxMask); idx< len(letterBytes){
b [i] = letterBytes [idx]
i--
}
缓存>>> = letterIdxBits
remaining--
}

返回字符串(b)
}

另外请注意,这最后一个解决方案不需要您初始化(种子) math / rand 包的全局 Rand 因为这是不使用的(我们的 rand.Source 被正确初始化/接种)。



还有一件事请注意: math / rand 的包装文件:


默认的来源是

因此,默认的源代码比 Source 可以通过 rand.NewSource()获得,因为默认源必须在并发访问/使用下提供安全性,而 rand。 NewSource()不提供这个(因此 Source 返回它更可能会更快)。



(7。使用 rand.Read()



Go 1.7添加 a math.Read() 函数和一个 Rand.Read() 方法。我们应该试着用这些来读取我们需要的字节数,以获得更好的性能。



这个问题有一个小问题:我们需要多少字节?我们可以说:与输出字母的数量一样多。我们认为这是一个较高的估计,因为字母索引使用少于8位(1字节)。但在这一点上,我们已经做得更糟了(因为获得随机位是难题),并且我们正在获得更多需求。

另外请注意,为了保持所有字母索引的平均分配,可能会有一些我们无法使用的垃圾随机数据,所以我们最终会跳过一些数据,因此当我们遍历所有的字节片时最终会缩短。我们需要进一步获得更多的随机字节,递归地。现在我们甚至失去了单一调用 rand 包的优势...



我们可以有点优化我们从 math.Rand()获得的随机数据的使用。我们可能估计需要多少个字节(比特)。 1个字母需要 letterIdxBits 位,并且我们需要 n 字母,所以我们需要 n * letterIdxBits / 8.0 字节四舍五入。我们可以计算一个随机指标不可用的概率(见上文),所以我们可以要求更多的更可能是足够的(如果结果不是这样,我们重复这个过程)。例如,我们可以将字节片处理为比特流,为此我们有一个很好的第三方库: github.com/icza/bitio (披露:我是作者)。

但基准代码仍然表明我们没有赢。为什么会这样?



最后一个问题的答案是因为 rand.Read()使用循环和一直调用 Source.Int63(),直到它填充传递的片段。到底是什么 RandStringBytesMaskImprSrc()解决方案,没有中间缓冲区,没有增加的复杂性。这就是为什么 RandStringBytesMaskImprSrc()仍然在位。是的, RandStringBytesMaskImprSrc()使用一个不同步的 rand.Source ,与 rand.Read()。但推论仍然适用;如果我们使用 Rand.Read()而不是 rand.Read()(前者也是可行的)不同步)。



II。基准



好吧,让我们为不同的解决方案进行基准测试。

  BenchmarkRunes 1000000 1703 ns / op 
BenchmarkBytes 1000000 1328 ns / op
BenchmarkBytesRmndr 1000000 1012 ns / op
BenchmarkBytesMask 1000000 1214 ns / op
BenchmarkBytesMaskImpr 5000000 395 ns / op
BenchmarkBytesMaskImprSrc 5000000 303 ns / op

只需从符号切换到字节,我们立即拥有22% $ b

摆脱 rand.Intn()并使用 rand.Int63()改为提供另一个 24%提升。



指数)减缓一点(由于重复呼叫): -20% ...



但是,当我们利用所有大部分的63个随机比特(来自一个 rand.Int63() call)的10个索引:加速了3.4次。

最后,如果我们解决了(非默认的,新的) rand.Source 而不是 rand.Rand ,我们再次获得 23%。



比较最终解决方案: RandStringBytesMaskImprSrc ) RandStringRunes()快5.6倍。


I want a random string of characters only (uppercase or lowercase), no numbers in Golang. What is the fastest and simplest way to do this in Go?

解决方案

Paul's solution provides a simple, general solution.

The question asks for the "the fastest and simplest way". Let's address this. We'll arrive at our final, fastest code in an iterative manner. Benchmarking each iteration can be found at the end of the answer.

All the solutions and the benchmarking code can be found on the Go Playground. The code on the Playground is a test file, not an executable. You have to save it into a file named XX_test.go and run it with go test -bench ..

I. Improvements

1. Genesis (Runes)

As a reminder, the original, general solution we're improving is this:

func init() {
    rand.Seed(time.Now().UnixNano())
}

var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func RandStringRunes(n int) string {
    b := make([]rune, n)
    for i := range b {
        b[i] = letterRunes[rand.Intn(len(letterRunes))]
    }
    return string(b)
}

2. Bytes

If the characters to choose from and assemble the random string contains only the uppercase and lowercase letters of the English alphabet, we can work with bytes only because the English alphabet letters map to bytes 1-to-1 in the UTF-8 encoding (which is how Go stores strings).

So instead of:

var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

we can use:

var letters = []bytes("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

Or even better:

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

Now this is already a big improvement: we could achieve it to be a const (there are string constants but there are no slice constants). As an extra gain, the expression len(letters) will also be a const! (The expression len(s) is constant if s is a string constant.)

And at what cost? Nothing at all. strings can be indexed which indexes its bytes, perfect, exactly what we want.

Our next destination looks like this:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func RandStringBytes(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letterBytes[rand.Intn(len(letterBytes))]
    }
    return string(b)
}

3. Remainder

Previous solutions get a random number to designate a random letter by calling rand.Intn() which delegates to Rand.Intn() which delegates to Rand.Int31n().

This is much slower compared to rand.Int63() which produces a random number with 63 random bits.

So we could simply call rand.Int63() and use the remainder after dividing by len(letterBytes):

func RandStringBytesRmndr(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))]
    }
    return string(b)
}

This works and is significantly faster, the disadvantage is that the probability of all the letters will not be exactly the same (assuming rand.Int63() produces all 63-bit numbers with equal probability). Although the distortion is extremely small as the number of letters 52 is much-much smaller than 1<<63 - 1, so in practice this is perfectly fine.

To make this understand easier: let's say you want a random number in the range of 0..5. Using 3 random bits, this would produce the numbers 0..1 with double probability than from the range 2..5. Using 5 random bits, numbers in range 0..1 would occur with 6/32 probability and numbers in range 2..5 with 5/32 probability which is now closer to the desired. Increasing the number of bits makes this less significant, when reaching 63 bits, it is negligible.

4. Masking

Building on the previous solution, we can maintain the equal distribution of letters by using only as many of the lowest bits of the random number as many is required to represent the number of letters. So for example if we have 52 letters, it requires 6 bits to represent it: 52 = 110100b. So we will only use the lowest 6 bits of the number returned by rand.Int63(). And to maintain equal distribution of letters, we only "accept" the number if it falls in the range 0..len(letterBytes)-1. If the lowest bits are greater, we discard it and query a new random number.

Note that the chance of the lowest bits to be greater than or equal to len(letterBytes) is less than 0.5 in general (0.25 on average), which means that even if this would be the case, repeating this "rare" case decreases the chance of not finding a good number. After n repetition, the chance that we sill don't have a good index is much less than pow(0.5, n), and this is just an upper estimation. In case of 52 letters the chance that the 6 lowest bits are not good is only (64-52)/64 = 0.19; which means for example that chances to not have a good number after 10 repetition is 1e-8.

So here is the solution:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
    letterIdxBits = 6                    // 6 bits to represent a letter index
    letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
)

func RandStringBytesMask(n int) string {
    b := make([]byte, n)
    for i := 0; i < n; {
        if idx := int(rand.Int63() & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i++
        }
    }
    return string(b)
}

5. Masking Improved

The previous solution only uses the lowest 6 bits of the 63 random bits returned by rand.Int63(). This is a waste as getting the random bits is the slowest part of our algorithm.

If we have 52 letters, that means 6 bits code a letter index. So 63 random bits can designate 63/6 = 10 different letter indices. Let's use all those 10:

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
    letterIdxBits = 6                    // 6 bits to represent a letter index
    letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
    letterIdxMax  = 63 / letterIdxBits   // # of letter indices fitting in 63 bits
)

func RandStringBytesMaskImpr(n int) string {
    b := make([]byte, n)
    // A rand.Int63() generates 63 random bits, enough for letterIdxMax letters!
    for i, cache, remain := n-1, rand.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = rand.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i--
        }
        cache >>= letterIdxBits
        remain--
    }

    return string(b)
}

6. Source

The Masking Improved is pretty good, not much we can improve on it. We could, but not worth the complexity.

Now let's find something else to improve. The source of random numbers.

There is a crypto/rand package which provides a Read(b []byte) function, so we could use that to get as many bytes with a single call as many we need. This wouldn't help in terms of performance as crypto/rand implements a cryptographically secure pseudorandom number generator so it's much slower.

So let's stick to the math/rand package. The rand.Rand uses a rand.Source as the source of random bits. rand.Source is an interface which specifies a Int63() int64 method: exactly and the only thing we needed and used in our latest solution.

So we don't really need a rand.Rand (either explicit or the global, shared one of the rand package), a rand.Source is perfectly enough for us:

var src = rand.NewSource(time.Now().UnixNano())

func RandStringBytesMaskImprSrc(n int) string {
    b := make([]byte, n)
    // A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
    for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i--
        }
        cache >>= letterIdxBits
        remain--
    }

    return string(b)
}

Also note that this last solution doesn't require you to initialize (seed) the global Rand of the math/rand package as that is not used (and our rand.Source is properly initialized / seeded).

One more thing to note here: package doc of math/rand states:

The default Source is safe for concurrent use by multiple goroutines.

So the default source is slower than a Source that may be obtained by rand.NewSource(), because the default source has to provide safety under concurrent access / use, while rand.NewSource() does not offer this (and thus the Source returned by it is more likely to be faster).

(7. Using rand.Read())

Go 1.7 added a math.Read() function and a Rand.Read() method. We should be tempted to use these to read as many bytes as we need in one step, in order to achieve better performance.

There is one small "problem" with this: how many bytes do we need? We could say: as many as the number of output letters. We would think this is an upper estimation, as a letter index uses less than 8 bits (1 byte). But at this point we are already doing worse (as getting the random bits is the "hard part"), and we're getting more than needed.

Also note that to maintain equal distribution of all letter indices, there might be some "garbage" random data that we won't be able to use, so we would end up skipping some data, and thus end up short when we go through all the byte slice. We would need to further get more random bytes, "recursively". And now we're even losing the "single call to rand package" advantage...

We could "somewhat" optimize the usage of the random data we acquire from math.Rand(). We may estimate how many bytes (bits) we'll need. 1 letter requires letterIdxBits bits, and we need n letters, so we need n * letterIdxBits / 8.0 bytes rounding up. We can calculate the probability of a random index not being usable (see above), so we could request more that will "more likely" be enough (if it turns out it's not, we repeat the process). We can process the byte slice as a "bit stream" for example, for which we have a nice 3rd party lib: github.com/icza/bitio (disclosure: I'm the author).

But Benchmark code still shows we're not winning. Why is it so?

The answer to the last question is because rand.Read() uses a loop and keeps calling Source.Int63() until it fills the passed slice. Exactly what the RandStringBytesMaskImprSrc() solution does, without the intermediate buffer, and without the added complexity. That's why RandStringBytesMaskImprSrc() remains on the throne. Yes, RandStringBytesMaskImprSrc() uses an unsynchronized rand.Source unlike rand.Read(). But the reasoning still applies; and which is proven if we use Rand.Read() instead of rand.Read() (the former is also unsynchronzed).

II. Benchmark

All right, let's benchmark the different solutions.

BenchmarkRunes                   1000000              1703 ns/op
BenchmarkBytes                   1000000              1328 ns/op
BenchmarkBytesRmndr              1000000              1012 ns/op
BenchmarkBytesMask               1000000              1214 ns/op
BenchmarkBytesMaskImpr           5000000               395 ns/op
BenchmarkBytesMaskImprSrc        5000000               303 ns/op

Just by switching from runes to bytes, we immediately have 22% performance gain.

Getting rid of rand.Intn() and using rand.Int63() instead gives another 24% boost.

Masking (and repeating in case of big indices) slows down a little (due to repetition calls): -20%...

But when we make use of all (or most) of the 63 random bits (10 indices from one rand.Int63() call): that speeds up 3.4 times.

And finally if we settle with a (non-default, new) rand.Source instead of rand.Rand, we again gain 23%.

Comparing the final to the initial solution: RandStringBytesMaskImprSrc() is 5.6 times faster than RandStringRunes().

这篇关于如何在golang中生成一个固定长度的随机字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆