将字符串的元素作为字节访问是否会执行转换? [英] Does accessing elements of string as byte perform conversion?

查看:16
本文介绍了将字符串的元素作为字节访问是否会执行转换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Go 中,要访问 string 的元素,我们可以这样写:

In Go, to access elements of a string, we can write:

str := "text"
for i, c := range str {
  // str[i] is of type byte
  // c is of type rune
}

当访问 str[i] 时,Go 会执行从 runebyte 的转换吗?我猜答案是肯定的,但我不确定.如果是这样,那么,以下哪一种方法在性能方面更好?一个优于另一个(例如,就最佳实践而言)?

When accessing str[i] does Go perform a conversion from rune to byte? I would guess the answer is yes, but I am not sure. If so, then, which one of the following methods are better performance-wise? Is one preferred over another (in terms of best practice, for example)?

str := "large text"
for i := range str {
  // use str[i]
}

str := "large text"
str2 := []byte(str)
for _, s := range str2 {
  // use s
}

推荐答案

以下哪一种方法在性能方面更好?

绝对不是这个.

str := "large text"
str2 := []byte(str)
for _, s := range str2 {
  // use s
}

字符串是不可变的.[]byte 是可变的.这意味着 []byte(str) 制作一个副本.所以上面将复制整个字符串.我发现不知道何时复制字符串成为大字符串性能问题的主要来源.

Strings are immutable. []byte is mutable. That means []byte(str) makes a copy. So the above will copy the entire string. I've found being unaware of when strings are copied to be a major source of performance problems for large strings.

如果 str2 从未改变,编译器可能优化掉副本.出于这个原因,最好这样写,以确保字节数组永远不会改变.

If str2 is never altered, the compiler may optimize away the copy. For this reason, it's better to write the above like so to ensure the byte array is never altered.

str := "large text"
for _, s := range []byte(str) {
  // use s
}

这样就没有 str2 以后可能会被修改并破坏优化.

That way there's no str2 to possibly be modified later and ruin the optimization.

但这是一个坏主意,因为它会破坏任何多字节字符.见下文.

But this is a bad idea because it will corrupt any multi-byte characters. See below.

至于字节/符文转换,性能不是考虑因素,因为它们并不等效.c 将是一个符文,而 str[i] 将是一个字节.如果您的字符串包含多字节字符,则必须使用符文.

As for the byte/rune conversion, performance is not a consideration as they are not equivalent. c will be a rune, and str[i] will be a byte. If your string contains multi-byte characters, you have to use runes.

例如...

package main

import(
    "fmt"
)

func main() {
    str := "snow ☃ man"
    for i, c := range str {
        fmt.Printf("c:%c str[i]:%c
", c, str[i])
    }
}

$ go run ~/tmp/test.go
c:s str[i]:s
c:n str[i]:n
c:o str[i]:o
c:w str[i]:w
c:  str[i]: 
c:☃ str[i]:â
c:  str[i]: 
c:m str[i]:m
c:a str[i]:a
c:n str[i]:n

注意,使用 str[i] 会破坏多字节 Unicode 雪人,它只包含多字节字符的第一个字节.

Note that using str[i] corrupts the multi-byte Unicode snowman, it only contains the first byte of the multi-byte character.

无论如何都没有性能差异,因为 range str 已经必须逐个字符完成工作,而不是逐个字节.

There's no performance difference anyway as range str already must do the work to go character-by-character, not byte by byte.

这篇关于将字符串的元素作为字节访问是否会执行转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆